Speaker: Erick Galani Maziero University of Sao Paulo, Brazil
Title: Rhetorical Analysis Based on Large Amount of Data
Text possesses an elaborated structure that relates all of its content, giving it coherence. Several
methodologies have been employed in automatic discourse analysis, among them approaches based on lexical patterns and supervised machine learning. These approaches rely on annotated data, which is costly to obtain. The use of unlabelled data, which is cheap and abundant, is possible in semi-supervised learning, but many challenges arise with this approach. In this talk I am going to present an overview of my PhD research and what I am developing here, in the UofT. Basically, my PhD is about the use of never ending (with large amount of data) semi-supervised learning of the discourse analysis, according to Rhetorical Structure Theory (RST). The corpora, feature set,
proposed architecture and some challenges of the proposed learning are some details of the talk. Also, I am going to speak about the biggest NLP group in Brazil, which is called NILC.
For additional information, contact: Tong Wang