Making machines speak like people

Date de publication

In the 1999 American film Bicentennial Man, the late Robin Williams played a robot who strives to achieve the physical, social and legal status of a human being. The character’s growing language capabilities—his capacity to communicate fluently with his human family—proved crucial in his quest. But long before Williams donned his robot suit, people were dreaming about talking with machines naturally, conversing with them as they would with another person.

Earlier this year, Microsoft Korea hosted a roundtable on “Research on Signal Processing and Speech,” describing recent work on human-machine natural language communication. The research, a collaborative effort between Yonsei University and Microsoft Research, was led by Professor Hong-Goo Kang of the Yonsei’s School of Electrical and Electronic Engineering. Its ultimate goal is to make natural conversation between humans and machines possible.

The roundtable on “Research on Signal Processing and Speech” (bottom row: Professor Hong-goo Kan and Miran Lee)
(opens in new tab)
The roundtable highlighted signal processing and speech research using DNNs.
(pictured in bottom row: Professor Hong-goo Kang, Yonsei University [left], and Miran Lee, Microsoft Research [right])

Microsoft Research Blog

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.

The team focused on voice synthesis and text-to-speech (TTS) conversion, two elements crucial in achieving fluent, natural sounding machine speech. The mechanical, depersonalized voice of machines had been a limitation of previous TTS technologies, according to Kang, which is why the team focused on TTS technology based on deep neural networks (DNNs). DNNs attempt to replicate the neural network of the human brain, particularly the way neurons communicate with one another. By so doing, DNN facilitates a sophisticated type of machine learning that researchers call deep learning. Deep learning should allow machines to understand human speech and respond more relevantly and with a more natural sounding voice.

“The copyright of the research result belongs to me, but other IT companies and everyone else can share it,” said Kang. “It’s hard to conduct this kind of long-term project with just the resources in academia. Therefore, we must work with companies, which is why collaboration with Microsoft Research was so meaningful.” Microsoft Research also offered an internship to one of Kang’s students, who subsequently published his research and presented it at an international conference.

This collaboration is indicative of our commitment to create an ecosystem that connects companies and academic institutions, and our ongoing efforts to foster talented young computer-science researchers.

—Miran Lee, Principal Research Program Manager, Microsoft Research

Learn more