CSAIL Developing Spoken Language Browser

Bookmark and Share
CSAIL Developing Spoken Language Browser

Despite the growing popularity of audio and video content on the Web, it can be challenging to find them. Most typical search functions can only handle text-based materials, making it very difficult to locate audio or video files without some form of text associated with it.

Alternately, searching through audio content—such as a lecture—for a specific moment is a very time-consuming and tedious task, when a listener must fast forward or rewind a recording to find the point they want.

To solve these problems, faculty and students at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) are building a complex language processing system that may ultimately change the way we search the Web.

Their project, a Spoken Language Browser, will someday make audio and video files searchable by typing in key words, and allow users to navigate to a specific moment in the file.

“It takes you beyond the capabilities of manually searching through audio files and gives you more control over what are searching for,” said David Huynh, a former CSAIL graduate student and current full-time researcher in the lab. “This type of access could make it easier for people to learn about what’s going on in the world and listen to information directly from its source and make judgments for themselves rather than having it filtered through the media or another third party.”

The browser’s language processing functions involve two separate procedures for automatic speech recognition and transcription. The browser uploads audio and supplemental materials off line that are relevant to the lecture topic, such as books, papers, and articles. The browser then extracts topic-specific vocabulary words and creates an adaptive language model based on statistics on how the words were used in the materials.

This should also help with a chronic problem speech programs have with highly technical and specialized lectures: understanding technical terms and buzzwords. The recognizer can learn these terms along with their typical contexts from the supplemental material.

And with enough samples of a person’s lectures, the program can adapt to his or her style of talking. The speech recognizer can also identify and filter out coughs, sneezes, laughs, and other background noises when it transcribes the lecture.

Currently, the language browser’s main use is to enhance the educational experience for MIT students—processing recorded lectures available through MIT OpenCourseWare and MITWorld. There are currently 200 lectures available. The browser’s Principal Investigator, Jim Glass, hopes to eventually connect with other universities to build a larger library of lectures.

The program can also help students with specific learning challenges. It could offer hearing-impaired students more transcripts to speeches, lectures, and movies, all of which are expensive to transcribe manually. It could also aid non-native English speakers who are having difficulty understanding their professors. While Glass is focused on the program’s educational benefits, he also sees potential for the technology to be used in the sports and entertainment fields, making it easy to locate that touch down pass you missed or finding a certain line in a movie.

In the next phase of the program, designed by Associate Professor Regina Barzilay and a group of students, the text is processed though an algorithm for topical segmentation, which is used to reintroduce structure back into the text. The program compares the similarities of two different regions of text, specifically looking for content-rich words, and then groups the text according to those similarities creating paragraphs.

The browser is still in the prototype stage, but Glass says there’s been a lot of interest in it since the MIT Museum made it available to the public in May 2006. And he remains excited about all the potential future needs it will be able to serve. “I am happy we can do something to help teachers and students by making something that creates a more accessible learning experience,” he said.