Advances in Speaker Diarization at Google
Speaker
Quan Wang
Google
Host
Jim Glass
Abstract:
In this talk, Dr. Quan Wang will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models.
Bio:
Dr. Quan Wang is a Senior Staff Software Engineer at Google, leading the Hotword Research team, Hotword Quality team, and Speaker, Voice & Language team. Quan is an IEEE Senior Member, and was a former Machine Learning Scientist at Amazon Alexa team. Quan received his B.E. degree from Tsinghua University, and received his Ph.D. degree from Rensselaer Polytechnic Institute. Quan is the author of the Chinese textbook "Voice Identity Techniques: From core algorithms to engineering practice", winning the Distinguished Author of Year 2020 Award. Quan is also an online instructor, and his Speaker Recognition course is rated as a bestselling course on Udemy.
In this talk, Dr. Quan Wang will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models.
Bio:
Dr. Quan Wang is a Senior Staff Software Engineer at Google, leading the Hotword Research team, Hotword Quality team, and Speaker, Voice & Language team. Quan is an IEEE Senior Member, and was a former Machine Learning Scientist at Amazon Alexa team. Quan received his B.E. degree from Tsinghua University, and received his Ph.D. degree from Rensselaer Polytechnic Institute. Quan is the author of the Chinese textbook "Voice Identity Techniques: From core algorithms to engineering practice", winning the Distinguished Author of Year 2020 Award. Quan is also an online instructor, and his Speaker Recognition course is rated as a bestselling course on Udemy.