Speaker: Nathan Schneider, Carnegie Mellon University
Title: Bridging the Gap: Integrated Development of Linguistic Resources and Analyzers for NLP
When building datasets and analyzers for NLP, the path from linguistic description to computational implementation need not be disjointed: the goals and methodologies of each can inform one another. This talk presents forays into NLP for new text genres, following a trajectory that tightly integrates linguistic data preparation and computational modeling methodologies. I will discuss analyzers for syntax and semantics built with a process encompassing 1) representation, 2) annotation, and 3) automation. First, I will describe a new framework for broad-coverage lexical semantic analysis, with special attention to multiword expressions (such as "high school" and "go over") in the web reviews domain (Schneider et al., 2014 in LREC and TACL). To facilitate robust and efficient modeling at the token level, the lexical semantic representation has been designed to be compatible with shallow discriminative sequence models, with algorithmic enhancements to accommodate multiword expressions containing gaps. Second, I will touch on efforts to build syntactic datasets, taggers, and dependency parsers for Twitter message(Gimpel et al., ACL 2011; Owoputi et al., NAACL 2013; Schneider et al., LAW 2013; Kong et al., EMNLP 2014). Finally, if time permits, I will mention parallel efforts to model relational and functionalist semantics.
Nathan Schneider recently defended his Ph.D. at Carnegie Mellon University's Language Technologies Institute, advised by Noah Smith. Nathan’s research focuses on linguistic analysis problems in NLP involving syntax and semantics, especially in web genres. His dissertation develops a framework for broad-coverage, token-level computational lexical semantics. As an undergraduate, he studied Computer Science and Linguistics (with an emphasis on cognitive approaches to Semitic morphology) at the University of California, Berkeley. In the fall he will join the University of Edinburgh for a postdoc under the supervision of Mark Steedman