Similar to Adam but keeps the previous max gradient to avoid converging to local optima.

Where:

  1. all others and remaining equations are as explained in Adam