Advances in Speaker Diarization at Google

Speaker

Quan Wang

Google

Host

Jim Glass

Abstract:
In this talk, Dr. Quan Wang will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models.

Bio:
Dr. Quan Wang is a Senior Staff Software Engineer at Google, leading the Hotword Research team, Hotword Quality team, and Speaker, Voice & Language team. Quan is an IEEE Senior Member, and was a former Machine Learning Scientist at Amazon Alexa team. Quan received his B.E. degree from Tsinghua University, and received his Ph.D. degree from Rensselaer Polytechnic Institute. Quan is the author of the Chinese textbook "Voice Identity Techniques: From core algorithms to engineering practice", winning the Distinguished Author of Year 2020 Award. Quan is also an online instructor, and his Speaker Recognition course is rated as a bestselling course on Udemy.

Add to Calendar 2024-03-21 14:00:00 2024-03-21 15:00:00 America/New_York Advances in Speaker Diarization at Google Abstract: In this talk, Dr. Quan Wang will introduce the development and evolution of speaker diarization technologies at Google in the past decade, and how they landed as impactful products such as Cloud Speech-to-Text and the Pixel Recorder app. The talk will cover four critical milestones of the speaker diarization technologies at Google: (1) leveraging deep speaker embeddings; (2) leveraging supervised clustering; (3) leveraging sequence transducers; and (4) leveraging large language models. The talk will also discuss how speaker diarization will evolve in the new era of multimodal large language models.Bio: Dr. Quan Wang is a Senior Staff Software Engineer at Google, leading the Hotword Research team, Hotword Quality team, and Speaker, Voice & Language team. Quan is an IEEE Senior Member, and was a former Machine Learning Scientist at Amazon Alexa team. Quan received his B.E. degree from Tsinghua University, and received his Ph.D. degree from Rensselaer Polytechnic Institute. Quan is the author of the Chinese textbook "Voice Identity Techniques: From core algorithms to engineering practice", winning the Distinguished Author of Year 2020 Award. Quan is also an online instructor, and his Speaker Recognition course is rated as a bestselling course on Udemy. 32-G449 (Kiva/Patil Conference Room) Stata Center

Organizer & Contact

Marcia G. Davidson

marcia@csail.mit.edu

617-253-3049

Advances in Speaker Diarization at Google

Speaker

Host

March 21 2024

Location

Organizer & Contact