Speaker: Finale Doshi-Velez, Harvard University
Title: Prediction and Interpretation with Complex Data
Not only are today's data more complex than ever before, but the talks and explanations that we want from them are growing in sophistication. Latent variable models provide a powerful tool for summarizing data through a set of hidden variables. These models are generally trained to maximize prediction accuracy, and modern latent variable models now do an excellent job of finding compact summaries of the data with high predictive power. However, there are many situations in which good predictions alone are not sufficient. Whether the hidden variables have inherent value by providinginsights about the data, or whether we wish to interface with domain expert on how to improve a system, understanding what the discovered hidden variables mean is an important first step.
In this talk, I will discuss how the language of probabilistic modeling naturally and flexibly allows us to incorporate information about how humans organize knowledge in addition to finding predictive summaries of data. In particular, I will talk about how a new model, GraphSparse LDA, discovers interpretable latent structures without sacrificing (and sometimes improving upon!) prediction accuracy. The model incorporates knowledge about the relationships between observed dimensions into a probabilistic framework to find a small set of human-interpretable "concepts" that summarize the observed data. This general approach can be applied to a wide variety of domains, and in particular we use it to recover interpretable descriptions of novel, clinically-relevant autism subtypes from a medical data-set with thousands of dimensions.
Finale Doshi-Velez is a postdoctoral fellow jointly between Harvard's School of Engineering and Applied Sciences and the Center for Biomedical Informatics. She received her MSc from the University of Cambridge in 2009 (as a Marshall Scholar) and her PhD from MIT in 2012. She was selected as one of IEEE's "AI 10 to Watch" in 2013. Her research interests include latent variable modeling, sequential decision-making, and clinical applications.
For additional information contact: Mike Brudno