Speaker: Amit Gruber
Department of Computer Science
University of Toronto
Title: Latent Topic Models for Hypertext
Abstract: Latent topic models have been successfully applied as an unsupervised
topic discovery technique in large document collections. With the
proliferation of hypertext document collection such as the Internet,
there has also been great interest in extending these approaches to
hypertext (Cohn and Hofmann '01, Erosheva et al. '04). These
approaches typically model links in an analogous fashion to how they
model words - the document-link co-occurrence matrix is modeled in the
same way that the document-word co-occurrence matrix is modeled in
standard topic models.
We present a probabilistic generative model for hypertext document
collections that explicitly models the generation of links.
Specifically, links from a word w to a document d depend directly on
how frequent the topic of w is in d, in addition to the in-degree of
d. We show how to perform EM learning on this model efficiently. By
not modeling links as analogous to words, we end up using far fewer
free parameters and obtain better link prediction results.
Joint work with Michal Rosen-Zvi and Yair Weiss.
For additional information contact: Hugo Larochelle at http://www.cs.toronto.edu/~larocheh/