Speaker: James Martens Computer Science Dept. University of Toronto
Title: A QuasiNewton Optimization Algorithm for Deep and Temporal Neural Networks
Abstract: Neural networks are extremely complex parameterized functions, often with millions of free parameters, which are very useful tools in machinelearning / AI for building systems that can perform perceptual tasks such as object recognition. Unfortunately, the most powerful neural networks, those with many nested processing layers (called 'deep'), as well as those which can process temporal data, have been considered to be virtually impossible to train, especially with a pure optimization approach. In this talk I will a describe a recently developed quasiNewton algorithm for unconstrained high dimensional optimization which has has been able to resolve the longoutstanding problem of how to train neural networks with temporal or deep architectures from random initializations. This is an example of a problem where using a better optimizer didn't simply make things "faster" by some factor, or accelerate convergence at the end, but made the difference between finding good solutions and getting hopelessly stuck in bad regions of the parameter space before making any substantial progress on the objective. The approach is based on the general framework known as "Hessianfree Newton" (aka truncatedNewton, aka NewtonCG), and also relies on a generalization of the GaussNewton matrix to objectives beyond nonlinear least squares. I will also discuss how some of the practical advice I received from the optimization field turned out to be wrong, and how much of the formal theory was at best irrelevant, and at worse, misleading and counterproductive.
