Closed
Description
Part of #684
E.g.
-
gradient descent with momentumAdding Adam optimiser #1460 -
adaptive learning rate gradient descent, e.g. AdaGradAdaGrad optimiser #1468
(Note both the links are for stochastic gradient descent, but we'd be implementing a simpler, non-stochastic one)
Won't be brilliant, but can be very informative, so largely an education thing this (or just fun)