Speaking Out "Cloud": How Cloud Computing and Mashups are Fostering Multimodal Mobile Services

Speaker: Giuseppe Di Fabbrizio , AT&T Labs - Research
Date: November 2 2009
Time: 11:00AM to 12:00PM
Location: 32-G882 (Stata Center - 8th Floor Reading Room)
Host: Jim Glass, MIT CSAIL
Contact: Marcia Davidson, 617-253-3049, marcia@csail.mit.edu
Relevant URL: Speech is becoming a more attractive interface for mobile devices since it can overcome the input limitations of these mobile devices and it is safer for multitasking users. Plus speech is a direct, intuitive interface that requires no learning. And with the proliferation of web content - from business searches, mapping services, and game applications - it makes sense to combine, or mash up, speech interfaces with web services.
However, small devices have limited computational capabilities to perform speech processing tasks including automatic speech recognition and text-to-speech conversion that are required for speech interfaces, especially when large vocabularies or high quality synthesis are involved. One popular solution is to move the speech processing resources into the network by concentrating the heavy computation load in server farms. Some successful services exploit this approach, but to date these services perform a single specific task and it is unclear how easily these services can expand to perform other tasks, nor is it known whether they can scale to accommodate large deployments.
To address these challenges, we introduce the AT&T speech mashup architecture, a novel approach that leverages web services and cloud computing to make it easier to combine web content with a speech interface. We show that this new compositional method is suitable for integrating automatic speech recognition, text-to-speech synthesis, natural language understanding and multimodal understanding technologies into real multimodal mobile services. The generality of this method allows researchers and speech practitioners to explore a countless variety of mobile multimodal services with a finer grain of control and richer multimedia interfaces. Moreover, we demonstrate that the speech mashup is scalable and reduces network latency for better user experience.
Giuseppe Di Fabbrizio is a Lead Member of Research Staff in the IP & Voice Services Research Laboratory at AT&T Labs - Research in Florham Park, NJ. During his career, he has conducted research on multimodal and spoken dialog systems, conversational agents,
natural language generation, multimodal and speech system architectures, platforms and services, publishing more than fifty conference and journal papers on these subjects. He was instrumental in the development and deployment of the AT&T VoiceTone(R) Dialog Automation product for the AT&T business enterprise customers and the recipient of the 2008 AT&T Science and Technology Medal Award for outstanding technical innovation and leadership in the advancement of spoken language technologies, architectures, and services. Di Fabbrizio is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), an elected member (2009-2011) of the IEEE Signal Processing Society's "Speech and Language Processing Technical Committee" (SLTC) in the area of dialog systems, serves as editor of the SLTC’s quarterly newsletter, and contributes as a program committee member and technical reviewer for numerous international conferences, journals, and workshops. Prior to joining AT&T, he worked as a Senior Researcher at Telecom Italia Lab (formerly CSELT, now mostly Loquendo).
See other events that are part of
See other events happening in November 2009