Skip to content

Commit 7c40a79

Browse files
authored
Merge pull request #245 from kozistr/feature/grokfast-optimizer
[Feature] Implement GrokFast optimizer
2 parents 15d52f6 + 3b56b70 commit 7c40a79

13 files changed

+361
-75
lines changed

README.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
1212
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13-
Currently, **68 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!
13+
Currently, **69 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!
1414

1515
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
1616

@@ -165,6 +165,7 @@ supported_optimizers = get_supported_optimizers()
165165
| bSAM | *SAM as an Optimal Relaxation of Bayes* | [github](https://github.com/team-approx-bayes/bayesian-sam) | <https://arxiv.org/abs/2210.01620> | [cite](https://ui.adsabs.harvard.edu/abs/2022arXiv221001620M/exportcitation) |
166166
| Schedule-Free | *Schedule-Free Optimizers* | [github](https://github.com/facebookresearch/schedule_free) | <https://github.com/facebookresearch/schedule_free> | [cite](https://github.com/facebookresearch/schedule_free) |
167167
| FAdam | *Adam is a natural gradient optimizer using diagonal empirical Fisher information* | [github](https://github.com/lessw2020/fadam_pytorch) | <https://arxiv.org/abs/2405.12807> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240512807H/exportcitation) |
168+
| Grokfast | *Accelerated Grokking by Amplifying Slow Gradients* | [github](https://github.com/ironjr/grokfast) | <https://arxiv.org/abs/2405.20233> | [cite](https://github.com/ironjr/grokfast?tab=readme-ov-file#citation) |
168169

169170
## Supported LR Scheduler
170171

@@ -325,7 +326,7 @@ If you use this software, please cite it below. Or you can get it from "cite thi
325326
month = jan,
326327
title = {{pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch}},
327328
url = {https://github.com/kozistr/pytorch_optimizer},
328-
version = {2.12.0},
329+
version = {3.0.1},
329330
year = {2021}
330331
}
331332

docs/changelogs/v3.0.1.md

+2
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@
88
* support not-using-first-momentum when beta1 is not given
99
* default dtype for first momentum to `bfloat16`
1010
* clip second momentum to 0.999
11+
* Implement `GrokFast` optimizer. (#244, #245)
12+
* [Accelerated Grokking by Amplifying Slow Gradients](https://arxiv.org/abs/2405.20233)
1113

1214
### Bug
1315

docs/index.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
1212
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13-
Currently, **68 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!
13+
Currently, **69 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!
1414

1515
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
1616

@@ -165,6 +165,7 @@ supported_optimizers = get_supported_optimizers()
165165
| bSAM | *SAM as an Optimal Relaxation of Bayes* | [github](https://github.com/team-approx-bayes/bayesian-sam) | <https://arxiv.org/abs/2210.01620> | [cite](https://ui.adsabs.harvard.edu/abs/2022arXiv221001620M/exportcitation) |
166166
| Schedule-Free | *Schedule-Free Optimizers* | [github](https://github.com/facebookresearch/schedule_free) | <https://github.com/facebookresearch/schedule_free> | [cite](https://github.com/facebookresearch/schedule_free) |
167167
| FAdam | *Adam is a natural gradient optimizer using diagonal empirical Fisher information* | [github](https://github.com/lessw2020/fadam_pytorch) | <https://arxiv.org/abs/2405.12807> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240512807H/exportcitation) |
168+
| Grokfast | *Accelerated Grokking by Amplifying Slow Gradients* | [github](https://github.com/ironjr/grokfast) | <https://arxiv.org/abs/2405.20233> | [cite](https://github.com/ironjr/grokfast?tab=readme-ov-file#citation) |
168169

169170
## Supported LR Scheduler
170171

@@ -325,7 +326,7 @@ If you use this software, please cite it below. Or you can get it from "cite thi
325326
month = jan,
326327
title = {{pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch}},
327328
url = {https://github.com/kozistr/pytorch_optimizer},
328-
version = {2.12.0},
329+
version = {3.0.1},
329330
year = {2021}
330331
}
331332

docs/optimizer.md

+12
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,18 @@
156156
:docstring:
157157
:members:
158158

159+
::: pytorch_optimizer.gradfilter_ema
160+
:docstring:
161+
:members:
162+
163+
::: pytorch_optimizer.gradfilter_ma
164+
:docstring:
165+
:members:
166+
167+
::: pytorch_optimizer.GrokFastAdamW
168+
:docstring:
169+
:members:
170+
159171
::: pytorch_optimizer.GSAM
160172
:docstring:
161173
:members:

0 commit comments

Comments
 (0)