Skip to content

Commit 4735dce

Browse files
committed
docs: README.rst
1 parent 50f2bac commit 4735dce

File tree

1 file changed

+10
-21
lines changed

1 file changed

+10
-21
lines changed

README.rst

+10-21
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,8 @@ Also, most of the captures are taken from ``Ranger21`` paper.
7373
Adaptive Gradient Clipping (AGC)
7474
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7575

76-
| This idea originally proposed in ``NFNet (Normalized-Free Network)``
77-
paper.
78-
| AGC (Adaptive Gradient Clipping) clips gradients based on the
79-
``unit-wise ratio of gradient norms to parameter norms``.
76+
| This idea originally proposed in ``NFNet (Normalized-Free Network)`` paper.
77+
| AGC (Adaptive Gradient Clipping) clips gradients based on the ``unit-wise ratio of gradient norms to parameter norms``.
8078
8179
- code :
8280
`github <https://github.com/deepmind/deepmind-research/tree/master/nfnets>`__
@@ -99,8 +97,7 @@ centralizing the gradient to have zero mean.
9997
Softplus Transformation
10098
~~~~~~~~~~~~~~~~~~~~~~~
10199

102-
By running the final variance denom through the softplus function, it
103-
lifts extremely tiny values to keep them viable.
100+
By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.
104101

105102
- paper : `arXiv <https://arxiv.org/abs/1908.00700>`__
106103

@@ -123,8 +120,7 @@ Positive-Negative Momentum
123120
| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png |
124121
+--------------------------------------------------------------------------------------------------------------------+
125122

126-
- code :
127-
`github <https://github.com/zeke-xie/Positive-Negative-Momentum>`__
123+
- code : `github <https://github.com/zeke-xie/Positive-Negative-Momentum>`__
128124
- paper : `arXiv <https://arxiv.org/abs/2103.17182>`__
129125

130126
Linear learning-rate warm-up
@@ -143,8 +139,7 @@ Stable weight decay
143139
| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png |
144140
+-------------------------------------------------------------------------------------------------------------+
145141

146-
- code :
147-
`github <https://github.com/zeke-xie/stable-weight-decay-regularization>`__
142+
- code : `github <https://github.com/zeke-xie/stable-weight-decay-regularization>`__
148143
- paper : `arXiv <https://arxiv.org/abs/2011.11152>`__
149144

150145
Explore-exploit learning-rate schedule
@@ -154,18 +149,14 @@ Explore-exploit learning-rate schedule
154149
| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png |
155150
+---------------------------------------------------------------------------------------------------------------------+
156151

157-
158-
- code :
159-
`github <https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis>`__
152+
- code : `github <https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis>`__
160153
- paper : `arXiv <https://arxiv.org/abs/2003.03977>`__
161154

162155
Lookahead
163156
~~~~~~~~~
164157

165-
| ``k`` steps forward, 1 step back. ``Lookahead`` consisting of keeping
166-
an exponential moving average of the weights that is
167-
| updated and substituted to the current weights every ``k_{lookahead}``
168-
steps (5 by default).
158+
| ``k`` steps forward, 1 step back. ``Lookahead`` consisting of keeping an exponential moving average of the weights that is
159+
| updated and substituted to the current weights every ``k_{lookahead}`` steps (5 by default).
169160
170161
- code : `github <https://github.com/alphadl/lookahead.pytorch>`__
171162
- paper : `arXiv <https://arxiv.org/abs/1907.08610v2>`__
@@ -180,10 +171,8 @@ Acceleration via Fractal Learning Rate Schedules
180171
(Adaptive) Sharpness-Aware Minimization (A/SAM)
181172
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182173

183-
| Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value
184-
and loss sharpness.
185-
| In particular, it seeks parameters that lie in neighborhoods having
186-
uniformly low loss.
174+
| Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
175+
| In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
187176
188177
- SAM paper : `paper <https://arxiv.org/abs/2010.01412>`__
189178
- ASAM paper : `paper <https://arxiv.org/abs/2102.11600>`__

0 commit comments

Comments
 (0)