This is a hybrid between batch gradient descent and stochastic gradient descent.

At each iteration, it picks a sample at random and updates the gradient for that sample. For all others, the one from the previous iteration is kept around. After this, it just performs a full batch-gradient-descent on this!

Where:

  1. = parameters
  2. = the function to be optimized
  3. = learning rate
  4. = number of samples
  5. = random index at the current iteration