Stochastic Gradient Descent
This performs gradient descent on each of the samples in the dataset.
Where:
- = i’th input sample and label in the dataset
- all other equations and parameters are as explained in gradient descent
Pros:
- Runs fast
- Similar convergence guarantees as in batch gradient descent, though one should gradually anneal over time to achieve this.
- Supports online learning
Cons:
- Creates high variance updates to the parameters
See also: gradient-descent batch-gradient-descent minibatch-gradient-descent
AKA: SGD, incremental gradient descent
References: