Xiaomi AI Lab is developing a text-to-speech technology that would allow people with speech disorders to customise their voice for daily communications.
The project called “Own My Voice” seeks to provide a “spontaneous-style” text-to-speech technology, which essentially makes the synthesised voice sounds like a real person in its intonation, pause speed and other features. It hopes to replace the monotonous electronic voice with a more natural one.
Currently, this technology applies to many smart devices equipped with Xiaomi, the AI voice assistant of Xiaomi. The "Own My Voice" project shows spontaneous style text-to-speech technology can also be widely adopted in accessibility areas and improve user experience.
“We are excited to explore multiple values that technology innovation brings to us, such as responding to users' demands for the self-identity and the construction of identity,” said Zhu Xi, technology committee topic convener on Tech for Good initiative at Xiaomi Corporation.
The backbone of the project is a group of speech technology experts from Xiaomi AI Lab, who have published 37 papers on speech at top international conferences, such as the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
Deploying voiceprint matching algorithm
To generate the most suitable and personalised voice for intended users, Xiaomi AI Lab gathered voices of 200 volunteers from within the company.
The project team then deploy voiceprint matching algorithm to match the features of donated voices with those of the recipient’s voice. Through this approach, they find the most suitable voice as the basic sound of voice reference for the recipient.
In consideration of personalisation and privacy protection, the chosen real voice is manipulated with complex acoustic modification to form a new and original sound of voice. After this, the Xiaomi project team use spontaneous style text-to-speech technology to train AI model to gradually gain a natural voice rhythm and intonation that can express the emotion and the tone of a human.
"If we notice and address the needs of minority groups at an early stage, the process of technology diffusion could be greatly shortened. This allows the benefits of new technologies to become accessible to users with special needs without delay,” said Zhu Xi.
The "Own My Voice" project combines a variety of advanced algorithms with Xiaomi's self-developed speech technology to ensure the specificity, safety, and high-level of authenticity of the synthesised voice.
Moving forward, Xiaomi will continue receiving feedback from the voice recipient, and further study the feasibility of this project for a wider range of applications.