CSAIL Event Calendar: Previous Series
Cognitive and Computational Topics in Statistical Language Modeling
Speaker: Tom Griffiths , Brain and Cognitive Sciences Department - MIT
Generative models are widely used in statistical natural language processing, providing an intuitive means of formalizing the problem of recovering latent structure (meaning) from data (words). However, in order to permit exact inference, most applications of generative models use relatively small latent spaces and make strong independence assumptions. I will talk about several projects that employ a different strategy, exploring large (potentially infinite) and complex generative models for text using approximate inference. The latent structures postulated in these models are useful for analyzing the content of sets of documents, and for predicting and explaining various aspects of human linguistic cognition. The common theme is the idea of representing documents as mixtures of probabilistic topics, as in Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003). I will present a simple Markov chain Monte Carlo algorithm for LDA, discuss applications of this algorithm to a corpus of scientific documents and to predicting human word association, and present extensions to this model that allow more complex dependencies between words in a document and between topics in a corpus. This is joint work with Mark Steyvers, Josh Tenenbaum, Dave Blei, and Mike Jordan.