Project

Speaker Verification and Diarization

The speech signal contains information about the talker's identity, which can be used on its own, or in conjunction with other modalities, to determine a person's identity.

In our research, we aim to develop speech processing methods that allow us to determine a person identity from a recorded speech signal. We are exploring deep learning methods that can learn a low-dimensional embedding speaker space to help with this task. Current challenges we are addressing include speaker verification based on short recordings, adverse conditions, and limited training examples.

A related problem is speaker diarization, which is the task of determining who is speaking when in a long audio recording such as a meeting. When the number of an identity of the speakers is not known ahead of time, this is a form of unsupervised learning. There is also multi-target detection and identification task that aim to determine whether or not a recorded utterance was spoken by one of a large number of "blacklisted" speakers. Spoofing countermeasure is also an important issue to discriminate 'fake' signal from authentic ones.

Combining with other speech processing techqniue, speaker's identity can be utilized for speaker adaptation for acoustic modeling in automatic speech recognition system. Text-to-speech synthesizer and voice conversion system also need speaker's identity to produce target speaker's voice from the referenced speaker.

Group

Spoken Language Systems Group

Contact us

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Jun 21 '18

Research Areas

AI & ML

Project

Speaker Verification and Diarization

Group

Contact us

Research Areas

Group

Members

Jim Glass