Minibatch Gradient Descent
This performs gradient descent on a smaller subset of samples in the dataset.
Where:
- = batch size
- = i’th input batch from the dataset.
- all other equations and parameters are as explained in gradient descent
Pros:
- Reduces high variance updates seen in stochastic gradient descent
- Can convert some level-2 blas ops into level-3
- Similar convergence guarantees as in batch gradient descent
- supports online learning
Cons:
- is now a hyper parameter to be tuned for effect by decreasing over time.
This is one of the most commonly used minimization procedures in ML.
See also: gradient-descent batch-gradient-descent stochastic-gradient-descent
References: