Top
Back to All Events

Colloquium Series: Ruiqi Zhong, "Building Expert-Level Language Models from Decomposed Weak Validations"

  • Bahen Centre for Information Technology, Room 3200 40 Saint George Street Toronto, ON, M5S 2E4 Canada (map)

This talk has been postponed and will be rescheduled.


Ruiqi Zhong smiles facing the camera

Speaker:

Ruiqi Zhong

Talk Title:

Building Expert-Level Language Models from Decomposed Weak Validations

Date and Location:

Tuesday, March 25, 2025

Bahen Centre for Information Technology, BA 3200

This lecture is open to the public. No registration is required, but space is limited.

Abstract:

Language models (LMs) can process large volumes of information and perform complex reasoning. They hold the promise of executing expert-level tasks, such as brainstorming scientific hypotheses or developing complex software. However, building these LMs requires humans to validate their outputs, which is challenging; e.g., developers cannot easily validate whether complex software is bug-free. If our validation is fallible, LMs may learn to "hack" the validators, convincing us that they are right even when they are wrong.

To address this, I show how to decompose complex validation tasks into "weaker" ones that are easier for humans or LMs: e.g., validating return values rather than entire programs, or validating discoveries on individual samples rather than on entire datasets. Through several examples, I show how these techniques allow us to use LMs for expert-level tasks more reliably. Looking forward, I discuss how to use LMs to automate these task decompositions, and how we can use these frameworks to monitor both individual AI systems and their broader impact within society.

About Ruiqi Zhong:

Ruiqi Zhong is a final-year Ph.D. student at UC Berkeley, co-advised by Jacob Steinhardt and Dan Klein. He was previously a part-time member of technical staff at Anthropic, where he worked on the automated red teaming team. His research is at the intersection of machine learning and NLP, and he develops language model systems to advance the frontier of human capabilities. He developed the earliest prototype of instruction-tuning, and his research contribution has been scaled up by leading language model companies, such as Google, OpenAI, and Anthropic.