Speaker: James Martens

Department of Computer Science, University of Toronto

Title: Deep Learning via Hessian-free Optimization

Abstract: Effective training of neural network-type models with many layers

(deep nets) is well known within the machine learning community as a

difficult and important problem. Gradient-based optimization

algorithms such as stochastic gradient descent and non-linear

conjugate gradient, while quite effective for shallow networks, seem

to fail for deeper ones. One possible explanation for this

phenomenon is that the objective functions corresponding to deep nets

have an increased frequency and severity of bad local optima. A

second one is that they contain complex curvature which makes them too

hard for these optimization algorithms to efficiently navigate.

Recent work done under the umbrella term "deep learning" has shown

that these algorithms can be salvaged if the networks are first

"pre-trained". Pre-training typically consists of viewing the network

as a stacked series of restricted Boltzmann machines (RBMs) or

auto-encoders, one for each layer, and then greedily learning each in

sequence starting from the input. In this talk I discuss how a more advanced optimization technique

which properly accounts for curvature known as "Hessian-free

optimization" or "Newton-CG", if carefully adapted, can solve deep

learning problems on large models without the need for pre-training,

and often achieve significantly better results. In particular I focus

on the problem of training deep auto-encoders, adapting the same

datasets and model architectures as considered by Hinton et al. (in

Science, 2006). These results seem to argue in favor of the second

explanation for the difficulty of training deep models while proposing

a practical solution. And because this technique doesn't rely on any

special structure of the objective function, it can be applied to

virtually any model, including ones for which there is no known

pre-training procedure. In particular I show how recurrent neural

networks (RNNs) can be effectively trained on certain datasets from

the RNN literature that were previously thought to be un-trainable

without either augmenting the model or biasing the objective towards

the correct solution.

For additional Information, Contact:

Hugo Larochelle

http://www.cs.toronto.edu/~larocheh/