Stochastic Average Gradient Descent
This is a hybrid between batch gradient descent and stochastic gradient descent.
At each iteration, it picks a sample at random and updates the gradient for that sample. For all others, the one from the previous iteration is kept around. After this, it just performs a full batch-gradient-descent on this!
Where:
- = parameters
- = the function to be optimized
- = learning rate
- = number of samples
- = random index at the current iteration
See also: batch-gradient-descent stochastic-gradient-descent
AKA: SAG
References: