In our research we aim to develop speech processing methods that allow us to determine a persons identity from a recorded speech signal. We are exploring deep learning methods that can learn a low-dimensional embedding speaker space to help with this task. Current challenges we are addressing include speaker verification based on short recordings, adverse conditions, and limited training examples.
A related problem is speaker diarization, which is the task of determining who is speaking when in a long audio recording such as a meeting. When the number of and identity of the speakers is not known ahead of time, this is a form of unsupervised learning.