The speech signal contains information about the talker's identity, which can be used on its own, or in conjunction with other modalities, to determine a person's identity.

In our research we aim to develop speech processing methods that allow us to determine a persons identity from a recorded speech signal.  We are exploring deep learning methods that can learn a low-dimensional embedding speaker space to help with this task.  Current challenges we are addressing include speaker verification based on short recordings, adverse conditions, and limited training examples.

A related problem is speaker diarization, which is the task of determining who is speaking when in a long audio recording such as a meeting.  When the number of and identity of the speakers is not known ahead of time, this is a form of unsupervised learning.

Research Areas