Top
Back to All Events

SRI Seminar Series: Owain Evans

Our weekly SRI Seminar Series welcomes Owain Evans, a research associate at Oxford University’s Future of Humanity Institute. Evans’ research interests are in AI safety and the future of AI, with a current focus on truthful and honest AI.

In this talk, Evans will present recent work on defining and measuring “truthfulness” in the context of large language models, including their calibration, and their ability to forecast world events. These topics will be considered in relation to the reduction of epistemic harms from AI and the problem of value alignment in the context of artificial general intelligence.

Talk title:

“Truthful language models and AI alignment”

Abstract:

Like it or not, language models will play an increasingly central role in how people learn about the world and communicate to others. This poses a challenge. Can we create models that are factually accurate, calibrated (e.g., avoiding overconfidence), and reliably non-manipulative? This kind of model would help individuals and society to form more accurate beliefs and to avoid misinformation. It would also have the potential to help with the problem of AGI alignment or AGI risk (Bostrom 2015, Russell 2019).

I will present recent work on defining and measuring "truthfulness" for language models, on calibration, and on using models to forecast world events. I will discuss connections to reducing epistemic harms from AI and to the problem of AGI alignment.

Recommended readings:

O. Evans, et. al., “Truthful AI: Developing and governing AI that does not lie,” arXiv preprint, 2021.

S. Lin, J. Hilton, O. Evans, “TruthfulQA: Measuring How Models Mimic Human Falsehoods,” arXiv preprint, 2021.

A. Zou, et. al., “Forecasting Future World Events with Neural Networks,” arXiv preprint, 2022.

About Owain Evans

Owain Evans is a research associate at the Future of Humanity Institute at Oxford University. His research interests are in AI safety and the future of AI. He received his PhD from MIT. In 2019, he was a visiting scholar in the CHAI group at UC Berkeley. He is on the board of directors at Ought, a non-profit lab that created the AI research assistant Elicit. He has worked on preference learning, reinforcement learning, forecasting, and philosophical questions relating to AI. His recent work aims to understand truthfulness and honesty for AI models.