Chinese smart EV maker XPeng has upgraded its auto-grade voice assistant using Microsoft custom neural voice capability based on Neural Text-to-Speech (TTS), a feature of Azure AI.
XPeng has already rolled out the new voice assistant technology to P7 customers across China via a major over-the-air (OTA upgrades). In future, the company plans to introduce future generations of the upgraded voice assistant into other production models.
The carmaker worked with Microsoft to overcome several key challenges to create the new cutting-edge voice assistant integration.
To deal with telecommunication network jitter while the car is moving, while reducing data traffic consumption and hardware burden, and ensuring continuous high-quality speech, XPENG introduced context-specific multi-level caches, caching high-quality sound in advance and distributing it to minimise reliance on the network.
To deliver natural-sounding high-fidelity speech, XPeng uses Microsoft Azure with caching and compression to deliver XPeng’s high-quality voice sampling rate of 24K Hz and quantization level of 16 bits, without overburdening the data network or the car’s own CPU. XPENG also worked with Microsoft to minimise ambiguity and to optimise accuracy in voice assistant speech.
"This is a cutting-edge exploration of vehicle voice interaction in the auto industry," said Hao Chao, senior expert with XPeng Automotive AI Products. “It required months of dedicated work by our team to overcome the challenges, and now delivers a whole new level of natural speech. With a deep understanding of urban mobility, we are finding many more scenarios to leverage AI technology for a high level of driver-machine intuition.”
“With advancements in research and technology, Azure Cognitive Services like vision and speech, will play a pivotal role in defining unique in-vehicle experiences,” said Sanjay Ravi, general manager, Automotive, Mobility, and Transportation Industry at Microsoft. “With speech as a primary interaction tool within the vehicle, Microsoft’s custom neural voice services enable automakers to develop their own differentiated and authentic branded experiences.”
Microsoft research breakthroughs in speech, natural language and machine translation have helped significantly advance the fluency, quality, fidelity and naturalness of voice assistant technology over the past several years. These innovations have been integrated into commercially-available speech and language capabilities within Azure Cognitive Services and other Microsoft products, so that companies like XPeng can bring richer, more engaging experiences to their customers.