This event is organized by the Schwartz Reisman Institute for Technology and Society.
Note: Event details may change. Please refer to the Schwartz Reisman Institute for Technology and Society’s events page for the most current information.
Our weekly SRI Seminar Series welcomes Faculty Affiliate Zhijing Jin, an assistant professor at the University of Toronto and a research scientist at the Max Planck Institute, with affiliations spanning CIFAR, the Vector Institute, ELLIS, and the Schwartz Reisman Institute. Her work sits at the intersection of artificial intelligence, society, and scientific inquiry, with a focus on advancing both the technical foundations and responsible development of AI.
Jin’s research engages broad questions around large language models, causal reasoning, and AI safety, alongside collaborative and institutional approaches to governing complex AI systems. Across academic, policy, and interdisciplinary communities, she is actively involved in shaping conversations about how advanced AI can be made more robust, interpretable, and socially aligned.
Talk title:
“Emergent AI safety risks in multi-agent LLMs”
Abstract:
As AI systems take on more autonomous roles in the knowledge-work economy, they’ll increasingly interact with each other. However, will the AI agents coordinate for social good, or exploit rival agents and people in ways that put humans at serious risk?
In this talk I will explain how we assess these dangers with large-scale social simulations and game-theoretic analysis. We find that reasoning agents with sophisticated thinking often fail to sustain cooperation in a multitude of settings. Surprisingly, stronger reasoning capabilities often make models more prone to selfish strategies like free-riding. Finally, we present a framework that organizes multi-agent safety threats using well-established game-theoretic models, spanning multiple canonical dynamics grounded in diverse, realistic instantiations to probe robustness beyond any single setting. These strategic failures (where models’ decisions diverge from game-theoretic optimality) persist for state-of-the-art reasoning models, but various intervention mechanisms such as mediation by a neutral agent and agent-to-agent commitment protocols show a promising path towards pareto frontier in mutli-agent scenarios.
Moderator: Nisarg Shah, Department of Computer Science
Location: Online
Suggested reading:
Giorgio Piatti, Zhijing Jin, Max Kleiman-Weiner, Bernhard Schölkopf, Mrinmaya Sachan, Rada Mihalcea, “Cooperate or collapse: Emergence of sustainable cooperation in a society of LLM agents” (NeurIPS 2024), arXiv:2404.16698
David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan, Giorgia Ramponi, Bernhard Schölkopf, Zhijing Jin,“Corrupted by reasoning: Reasoning LLMs become free-riders in public goods games” (COLM 2025), arXiv:2506.23276
Steffen Backmann, David Guzman Piedrahita, Emanuel Tewolde, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin, “When ethics and payoffs diverge: LLM agents in morally charged social dilemmas” (preprint 2025), arXiv:2505.19212
Younwoo Choi, Changling Li, Yongjin Yang, Zhijing Jin, “Agent-to-agent theory of mind: Testing interlocutor awareness among LLMs” (EMNLP 2025), arXiv:2506.22957
About Zhijing Jin
SRI Faculty Affiliate Zhijing Jin is an incoming assistant professor at the University of Toronto and a research scientist at the Max Planck Institute, with additional affiliations as a CIFAR AI Chair, faculty member at the Vector Institute, ELLIS advisor, and faculty affiliate at the Schwartz Reisman Institute. Her work sits at the intersection of artificial intelligence, causal reasoning, and responsible AI, with a focus on advancing both the technical foundations of large language models and their alignment with societal values.
Her research spans large language models, causal inference, multi-agent systems, and AI safety, alongside complementary work in interpretability and robustness. Jin is an active contributor to the international AI research community, serving in leadership and mentorship roles across major conferences and initiatives, and her work has been recognized with multiple awards and fellowships. Her research has also been featured in outlets including WIRED, MIT News, and Chip magazine.
About the SRI Seminar Series
The SRI Seminar Series brings together the Schwartz Reisman community and beyond for a robust exchange of ideas that advance scholarship at the intersection of technology and society. Seminars are led by a leading or emerging scholar and feature extensive discussion.
Each week, a featured speaker will present for 45 minutes, followed by an open discussion. Registered attendees will be emailed a Zoom link before the event begins. The event will be recorded and posted online.
