Nesterov Accelerated Gradient
momentum gradient descent performs big jumps due to momentum. To avoid this, NAG first computes the gradient and then makes a big jump.
NAG first makes a big jump in the direction of the previously accumulated gradient, which is . It then measures where it ends up and accordingly makes a correction.
Where:
- all other equations and parameters are are as explained in momentum gradient descent
See also: gradient-descent momentum-gradient-descent
AKA: NAG
References: