Sunday, February 10, 2019

Adam Optimizer: Online Estimates of Fisher Information Matrix (F_ii) for Natural Gradient Ascent

IMHO - what the adam optimizer  is really doing is a very clever version of natural gradient ascent, calculating online estimates of the diagonal of the Fisher information matrix -- without bias -- and using it to scale gradient steps.

Very cool, and what amazing engineering to be used so broadly :)

No comments:

Post a Comment