AMSgrad

Similar to Adam but keeps the previous max gradient to avoid converging to local optima.

$v'_t = max(v'_{t-1}, v_t)$

Where:

See also: adam
References: