Speaker: James Martens
Computer Science Dept.
University of Toronto
Title: A Quasi-Newton Optimization Algorithm for Deep and Temporal Neural Networks
Neural networks are extremely complex parameterized functions, often with millions of free parameters, which are very useful tools in machine-learning / AI for building systems that can perform perceptual tasks such as object recognition. Unfortunately, the most powerful neural networks, those with many nested processing layers (called 'deep'), as well as those which can process temporal data, have been considered to be virtually impossible to train, especially with a pure optimization approach.
In this talk I will a describe a recently developed quasi-Newton algorithm for unconstrained high dimensional optimization which has has been able to resolve the long-outstanding problem of how to train neural networks with temporal or deep architectures from random initializations. This is an example of a problem where using a better optimizer didn't simply make things "faster" by some factor, or accelerate convergence at the end, but made the difference between finding good solutions and getting hopelessly stuck in bad regions of the parameter space before making any substantial progress on the objective. The approach is based on the general framework known as "Hessian-free Newton" (aka truncated-Newton, aka Newton-CG), and also relies on a generalization of the Gauss-Newton matrix to objectives beyond nonlinear least squares. I will also discuss how some of the practical advice I received from the optimization field turned out to be wrong, and how much of the formal theory was at best irrelevant, and at worse, misleading and counterproductive.