Speaker: William Li, MIT
Title: Language Technologies for Understanding Law, Politics, and Public Policy
Through their activities, governments generate large amounts of heterogeneous text data, including judicial opinions, congressional and parliamentary bills, and laws and regulations. The public availability of these datasets offers opportunities for computational social scientists to develop novel algorithms and systems and answer research in law, political science, and public policy. In this talk, I will focus on two recent projects in this domain: 1) authorship attribution of unsigned U.S. Supreme Court opinions and 2) unsupervised pattern discovery on the United States Code, in which we use analogies from software engineering to analyze and visualize the U.S. legal code like a large software codebase. Finally, I will discuss ongoing work on developing and applying text reuse methods to find and summarize repeated sections of text in government bills and citizen comments, including a probabilistic extension of existing deterministic text reuse methods inspired by topic modeling approaches.
William Li is a PhD student in computer science at MIT. His dissertation research focuses on natural language processing and data science on open government datasets. He also is interested in accessibility and language-based assistive technologies for people with disabilities; he co-taught a semester-long assistive technology design course in Fall 2014 and helps run the MIT Assistive Technology Club.