@@ -53,7 +53,7 @@ Also, most of the captures are taken from `Ranger21` paper.
53
53
This idea originally proposed in ` NFNet (Normalized-Free Network) ` paper.
54
54
AGC (Adaptive Gradient Clipping) clips gradients based on the ` unit-wise ratio of gradient norms to parameter norms ` .
55
55
56
- * github : [ code ] ( https://github.com/deepmind/deepmind-research/tree/master/nfnets )
56
+ * code : [ github ] ( https://github.com/deepmind/deepmind-research/tree/master/nfnets )
57
57
* paper : [ arXiv] ( https://arxiv.org/abs/2102.06171 )
58
58
59
59
### Gradient Centralization (GC)
@@ -62,7 +62,7 @@ AGC (Adaptive Gradient Clipping) clips gradients based on the `unit-wise ratio o
62
62
63
63
Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.
64
64
65
- * github : [ code ] ( https://github.com/Yonghongwei/Gradient-Centralization )
65
+ * code : [ github ] ( https://github.com/Yonghongwei/Gradient-Centralization )
66
66
* paper : [ arXiv] ( https://arxiv.org/abs/2004.01461 )
67
67
68
68
### Softplus Transformation
@@ -83,7 +83,7 @@ By running the final variance denom through the softplus function, it lifts extr
83
83
84
84
![ positive_negative_momentum] ( assets/positive_negative_momentum.png )
85
85
86
- * github : [ code ] ( https://github.com/zeke-xie/Positive-Negative-Momentum )
86
+ * code : [ github ] ( https://github.com/zeke-xie/Positive-Negative-Momentum )
87
87
* paper : [ arXiv] ( https://arxiv.org/abs/2103.17182 )
88
88
89
89
### Linear learning-rate warm-up
@@ -96,22 +96,22 @@ By running the final variance denom through the softplus function, it lifts extr
96
96
97
97
![ stable_weight_decay] ( assets/stable_weight_decay.png )
98
98
99
- * github : [ code ] ( https://github.com/zeke-xie/stable-weight-decay-regularization )
99
+ * code : [ github ] ( https://github.com/zeke-xie/stable-weight-decay-regularization )
100
100
* paper : [ arXiv] ( https://arxiv.org/abs/2011.11152 )
101
101
102
102
### Explore-exploit learning-rate schedule
103
103
104
104
![ explore_exploit_lr_schedule] ( assets/explore_exploit_lr_schedule.png )
105
105
106
- * github : [ code ] ( https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis )
106
+ * code : [ github ] ( https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis )
107
107
* paper : [ arXiv] ( https://arxiv.org/abs/2003.03977 )
108
108
109
109
### Lookahead
110
110
111
111
` k ` steps forward, 1 step back. ` Lookahead ` consisting of keeping an exponential moving average of the weights that is
112
112
updated and substituted to the current weights every ` k_{lookahead} ` steps (5 by default).
113
113
114
- * github : [ code ] ( https://github.com/alphadl/lookahead.pytorch )
114
+ * code : [ github ] ( https://github.com/alphadl/lookahead.pytorch )
115
115
* paper : [ arXiv] ( https://arxiv.org/abs/1907.08610v2 )
116
116
117
117
### Chebyshev learning rate schedule
@@ -120,6 +120,15 @@ Acceleration via Fractal Learning Rate Schedules
120
120
121
121
* paper : [ arXiv] ( https://arxiv.org/abs/2103.01338v1 )
122
122
123
+ ### (Adaptive) Sharpness-Aware Minimization (A/SAM)
124
+
125
+ Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
126
+ In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
127
+
128
+ * SAM paper : [ paper] ( https://arxiv.org/abs/2010.01412 )
129
+ * ASAM paper : [ paper] ( https://arxiv.org/abs/2102.11600 )
130
+ * A/SAM code : [ github] ( https://github.com/davda54/sam )
131
+
123
132
## Citations
124
133
125
134
<details >
@@ -370,6 +379,36 @@ Acceleration via Fractal Learning Rate Schedules
370
379
371
380
</details >
372
381
382
+ <details >
383
+
384
+ <summary >Sharpness-Aware Minimization</summary >
385
+
386
+ ```
387
+ @article{foret2020sharpness,
388
+ title={Sharpness-aware minimization for efficiently improving generalization},
389
+ author={Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam},
390
+ journal={arXiv preprint arXiv:2010.01412},
391
+ year={2020}
392
+ }
393
+ ```
394
+
395
+ </details >
396
+
397
+ <details >
398
+
399
+ <summary >Adaptive Sharpness-Aware Minimization</summary >
400
+
401
+ ```
402
+ @article{kwon2021asam,
403
+ title={ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks},
404
+ author={Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon},
405
+ journal={arXiv preprint arXiv:2102.11600},
406
+ year={2021}
407
+ }
408
+ ```
409
+
410
+ </details >
411
+
373
412
## Author
374
413
375
414
Hyeongchan Kim / [ @kozistr ] ( http://kozistr.tech/about )
0 commit comments