The speech signal contains information about the talker's identity, which can be used on its own, or in conjunction with other modalities, to determine a person's identity.
In our research, we aim to develop speech processing methods that allow us to determine a person identity from a recorded speech signal. We are exploring deep learning methods that can learn a low-dimensional embedding speaker space to help with this task. Current challenges we are addressing include speaker verification based on short recordings, adverse conditions, and limited training examples.
A related problem is speaker diarization, which is the task of determining who is speaking when in a long audio recording such as a meeting. When the number of an identity of the speakers is not known ahead of time, this is a form of unsupervised learning. There is also multi-target detection and identification task that aim to determine whether or not a recorded utterance was spoken by one of a large number of "blacklisted" speakers. Spoofing countermeasure is also an important issue to discriminate 'fake' signal from authentic ones.
Combining with other speech processing techqniue, speaker's identity can be utilized for speaker adaptation for acoustic modeling in automatic speech recognition system. Text-to-speech synthesizer and voice conversion system also need speaker's identity to produce target speaker's voice from the referenced speaker.