Top
Back to All Events

Distinguished Lecture Series: David Duvenaud, “The big picture of LLM dangerous capability evals”

  • Schwartz Reisman Innovation Campus, Room W240 (2nd floor) 108 College Street Toronto, ON M5G 0C6 Canada (map)

The University of Toronto Department of Computer Science’s C.C. “Kelly” Gotlieb Distinguished Lecture Series welcomes Associate Professor David Duvenaud, Schwartz Reisman Chair in Technology and Society at the Schwartz Reisman Institute for Technology and Society (SRI), for a special presentation jointly presented by the SRI Seminar Series. A founding faculty member and Canada CIFAR AI Chair at the Vector Institute, Duvenaud is widely recognized for his contributions to AI safety, probabilistic deep learning, and generative modelling.

Duvenaud’s current research focuses on assessing dangerous capabilities in frontier AI models, mitigating catastrophic risks and developing institutional frameworks for post-AGI futures. In this talk, he will give an overview of his recent research conducted while part of Anthropic’s Alignment Science Team evaluating risks from advanced models, and how to develop more robust methods for AI alignment with human institutions and interests.

Moderator: Professor Sheila McIlraith, Department of Computer Science

We gratefully acknowledge the support of the Webster Family Charitable Giving Foundation for this event.

Talk title

“The big picture of LLM dangerous capability evals”

Abstract

How can we avoid AI disasters? The plan so far is mostly to check the extent to which AIs could cause catastrophic harms based on tests in controlled conditions. However, there are obvious problems with this approach, both technical and due to their limited scope. I'll give an overview of the work my team at Anthropic did to evaluate risks due to models feigning incompetence, colluding, or sabotaging human decision-making. I'll also discuss the idea of “control” techniques, which use AIs to monitor and set traps to look for bad behavior in other AIs. Finally, I'll outline the main problems beyond the scope of these approaches, in particular that of robustly aligning our institutions to human interests.


Suggested Reading


Venue

Schwartz Reisman Innovation Campus, University of Toronto, Room W240 (second floor)
108 College Street, Toronto, ON M5G 0C6

Seminar will be broadcast live via Zoom (register for link).


About David Duvenaud

Headshot of David Duvenaud

David Duvenaud

David Duvenaud is an associate professor in the Department of Computer Science and Statistical Sciences at the University of Toronto, where he holds a Schwartz Reisman Chair in Technology and Society. A leading voice in AI safety and artificial general intelligence (AGI) governance, Duvenaud’s current work focuses on evaluating dangerous capabilities in advanced AI systems, mitigating catastrophic risks from future models, and developing institutional designs for post-AGI futures. Duvenaud is a Canada CIFAR AI Chair and a founding faculty member at the Vector Institute, a member of Innovation, Science and Economic Development Canada’s Safe and Secure AI Advisory Group, and recently completed an extended sabbatical with the Alignment Science team at Anthropic.

Duvenaud’s early helped shape the field of probabilistic deep learning, with contributions including neural ordinary differential equations, gradient-based hyperparameter optimization, and generative models for molecular design. He has received numerous honors, including the Sloan Research Fellowship, Ontario Early Researcher Award, and best paper awards at NeurIPS, ICML, and ICFP. Before joining the University of Toronto, Duvenaud was a postdoctoral fellow in the Harvard Intelligent Probabilistic Systems group and completed his PhD at the University of Cambridge under Carl Rasmussen and Zoubin Ghahramani.


About the SRI Seminar Series

The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.

Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.

About the C.C. “Kelly” Gotlieb Distinguished Lecture Series

The C.C. “Kelly” Gotlieb Distinguished Lecture Series is named in honour of Professor Emeritus Calvin C. Gotlieb, a founding member of the Department of Computer Science. Through his vision, inspiration, and leadership, Gotlieb brought Canadians into the modern age of computing. His efforts led to the establishment of the Computation Centre at the University of Toronto in 1948 (which would become our Department of Computer Science). In 1951, he introduced the first undergrad and grad courses in computing in Canada. Gotlieb’s progressive ideas and revolutionary vision remain the foundation for today’s computer technology. To quote from his Order of Canada, “He has been largely responsible for leading Canadians into the modern age of computing… and has contributed immeasurably to the understanding and development of information technology in the academic community.”