Nadam
It uses look-ahead momentum vector directly to the current parameter update
Where:
- all other equations and parameters are as explained in Adam
See also: nesterov-accelerated-gradient adam
AKA: Nesterov accelerated adaptive moment estimation
References: