How can deep neural networks be trained more effectively at scale? The right solution could accelerate machine learning innovations across a wide range of applications.
Gennady Pekhimenko, an assistant professor in the department of computer and mathematical sciences at U of T Scarborough and the tri-campus graduate department of computer science, recently received Amazon’s 2020 Machine Learning Research Award and Facebook’s 2020 AI System Hardware/Software Co-Design Research Award to address this challenge.
Advances in both hardware and software have propelled the ongoing revolution in machine learning (ML), allowing for the efficient processing of vast amounts of data. At the same time, algorithms for deep neural networks (DNNs) have also improved.
As DNN models become more powerful and sophisticated, with an increasing number of parameters and layers, they require more robust computing power. Since a single hardware accelerator has limited computing power, a more effective way to train DNNs requires a structure for efficient and scalable distributed training processes, which split up the workload to train a model among multiple processors.
Major challenges stand in the way. Current approaches to distributed training offer limited scalability since they fail to address a fundamental limitation in the DNN training algorithms—a strong sequential dependency between layers, as required by the backpropagation algorithm, which is the cornerstone of the DNN training process.
Additionally, many existing DNN models suffer from high memory consumption, which limits the computing effectiveness of the accelerator. And as they seek new, more efficient models, machine learning researchers are constrained by the possible size and depth of the model.
To address these challenges, Pekhimenko proposes targeting two major goals:
First, he aims to develop fundamentally new machine learning algorithms, and to co-design them with new and existing computing systems to improve their scalability in existing and emerging ML applications.
Second, he plans to design new optimizations that significantly reduce memory and bandwidth consumption when training machine learning models.
Ultimately, a more effective and scalable way to train deep neural networks could improve the pace of advancements in machine learning, spanning applications from computer vision to natural language processing.