Speaker: Alistair Kennedy
University of Toronto
Title: Measuring Semantic Relatedness Across Languages
Measures of Semantic Relatedness are well established in Natural Language Processing. Their purpose is to determine the degree of relatedness between two words, without specifying the nature of their relationship. One method of accomplishing this is to use a word's distribution to determine its meaning. Distributional measures of semantic relatedness represent words as weighted vectors of the contexts in which that word appears. The relatedness between two words is determined by their vector distance. One limitation of distributional measures is that they are successful only between pairs of words in a single language, as contexts between two languages are not usually comparable. In this presentation I will describe a novel method of measuring semantic relatedness between pairs of words in two different languages, using distributional relatedness. This new cross-language measure uses pairs of known translations to create a mapping between between distributional representations in two languages. I evaluated this new measure on two data sets. For the first I constructed a data set of cross-language word pairs, with similarity scores, from French and English versions of Rubenstein & Goodenough's data set. My cross-language measure was evaluated based on how closely it correlated to human assigned scores. The second evaluation was to use the cross-language measures to select the correct translation of a word from a set of two candidates. I found that the new cross-language measure outperformed a unilingual baseline on both experiments.