This is a hybrid between batch gradient descent and stochastic gradient descent.

At each iteration, it picks a sample at random and updates the gradient for that sample. For all others, the one from the previous iteration is kept around. After this, it just performs a full batch-gradient-descent on this!

$\delta_{i_k} = \nabla_{\theta} J(\theta, x_{i_k}, y_{i_k})$ $\theta = \theta - \eta . \Sigma_i \delta_i$ $\forall i = [0, n)$

Where:

$\theta$ = parameters
$J$ = the function to be optimized
$\eta$ = learning rate
$n$ = number of samples
$i_k$ = random index at the current iteration