CSAIL Event Calendar: Previous Series

Cognitive and Computational Topics in Statistical Language Modeling

Speaker: Tom Griffiths , Brain and Cognitive Sciences Department - MIT
Date: October 14 2003
Time: 4:00pm
Location: NE43 8th floor playroom

Generative models are widely used in statistical natural language processing, providing an intuitive means of formalizing the problem of recovering latent structure (meaning) from data (words). However, in order to permit exact inference, most applications of generative models use relatively small latent spaces and make strong independence assumptions. I will talk about several projects that employ a different strategy, exploring large (potentially infinite) and complex generative models for text using approximate inference. The latent structures postulated in these models are useful for analyzing the content of sets of documents, and for predicting and explaining various aspects of human linguistic cognition. The common theme is the idea of representing documents as mixtures of probabilistic topics, as in Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003). I will present a simple Markov chain Monte Carlo algorithm for LDA, discuss applications of this algorithm to a corpus of scientific documents and to predicting human word association, and present extensions to this model that allow more complex dependencies between words in a document and between topics in a corpus. This is joint work with Mark Steyvers, Josh Tenenbaum, Dave Blei, and Mike Jordan.

See other events that are part of CSAIL Student Seminar Fall 2003

See other events happening in October 2003


About Us Research News Resources Directory