Batch Gradient Descent

This performs gradient descent with the entire dataset.

The equations and their meanings are as defined in gradient descent

Pros:

Guaranteed to converge to global minima for convex $J$ .
Converges to local minima for other surface types.

Cons:

Doesn’t work for datasets needing to be out-of-core.
Thus, no online learning
Runs slow

See also: gradient-descent minibatch-gradient-descent stochastic-gradient-descent
References:

https://www.slideshare.net/SebastianRuder/optimization-for-deep-learning