AMSgrad Similar to Adam but keeps the previous max gradient to avoid converging to local optima. Where: all others and remaining equations are as explained in Adam See also: adam References: http://ruder.io/optimizing-gradient-descent/index.html#amsgrad