Deep learning algorithms allow computers to learn from large amounts of data and are known for advancing the state of the art in the artificial intelligence field, leading to smarter cancer treatments and self-driving technology.
Now, researchers at the University of Toronto and the University of Massachusetts Amherst (UMA) have applied deep learning algorithms to software for the animation and gaming industry.
“As an animator, I’m trying to animate realistically,” said Chris Landreth, an Academy Award-winning animator and filmmaker who was appointed to U of T's computer science department as a distinguished research artist-in-residence.
Landreth and U of T computer science Professor Karan Sighh – along with with UMA PhD students Yang Zhou and Zhan Xu and Assistant Professors Evangelos Kalogerakis and Subhransu Maji – presented “VisemeNet: Audio-Driven Animator-Centric Speech Animation” at the Association of Computing Machinery’s (ACM) Special Interest Group on Computer and Interactive Techniques this week in Vancouver.
The premier event attracts thousands in the research, animation and software industries. It’s where Disney-Pixar screened its earliest advances in computer animation, including the 1989 Academy Award-winning Tin Toy.
Using findings from psycho-linguistics literature, the researchers have created a network to predict visemes – the look of mouth shapes – which map to vocal cord sounds. Phonemes such as p, b, and m correspond to a single viseme. Also, a single phoneme – such as “oh” – can vary in appearance from face to face.
This variation is the result of how we breathe, as well as tongue, jaw and facial muscles used in speech. Their network predicts these movements, which track to the lower part of the face.
Combining the results of phonemes and facial movements, the researchers are able to use only an actor’s voice as audio input to predict the speech motion curves. These curves are then fully editable in animator software, which the researchers say is a fundamental advantage of their method.
VisemeNet was developed as part of JALI, or jaw and lip integration. JALI allows animators to create realistic and expressive speech animations of computer-generated imagery (CGI) characters. The commercially available research by Singh, Landreth, PhD student Pif Edwards and Professor Eugene Fiume can be integrated into current animator software.
This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC).