A Recurrent Neural Network Language Model.
Dept. Computer Science, University of Toronto
Neural Networks (RNNs) form an extremely powerful class of sequence
models. Despite their expressiveness, RNNs did not gain widespread use
because they were impossible to train with gradient descent. In
this talk, we show that the recently developed Hessian-Free
optimizer can train RNNs on problems previously unsolvable by
RNNs, such as problems that exhibit long term dependencies.
We then trained the RNN to predict the next character in a sequence of
characters from Wikipedia. The RNN acquired a large vocabulary and
a significant amount of knowledge of the English grammar, and
its predictive log probability is 10% better than PPM's.
This is joint
work with James Martens and Geoff Hinton.