Spoken Language Systems Group Develops New Automatic Speaker Tracking Technique
24 October 2013
CSAIL’s Spoken Language Systems Group has unveiled a new technique for automatically tracking speakers in audio recordings. The new technique tackles the task of speaker diarization, or computationally determining how many speakers are present in a recording. Traditional approaches to developing automatic speaker diarization tools involve supervised machine learning, which centers around having the computer program learn from annotated conversations that indicate when different speakers enter a conversation and how many are speaking at a certain time.
In a paper to be published in the October issue of IEEE Transactions on Audio, Speech, and Language Processing, researchers from the Spoken Language Systems Group describe a new technique for speaker diarization that operates without any supervised machine learning. Additionally, the technique allows for differentiation between individual speakers’ voices.
“You can know something about the identity of a person from the sound of their voice, so this technology is keying in to that type of information,” said Jim Glass, a senior research scientist at CSAIL and head of the Spoken Language Systems Group. “In fact, this technology could work in any language. It’s insensitive to that.”
Read the full article here: http://web.mit.edu/newsoffice/2013/automatic-speaker-tracking-in-audio-recordings-1018.html.