This performs gradient descent with the entire dataset.

The equations and their meanings are as defined in gradient descent

Pros:

  1. Guaranteed to converge to global minima for convex .
  2. Converges to local minima for other surface types.

Cons:

  1. Doesn’t work for datasets needing to be out-of-core.
  2. Thus, no online learning
  3. Runs slow