Making machines speak like people

Date de publication

In the 1999 American film Bicentennial Man, the late Robin Williams played a robot who strives to achieve the physical, social and legal status of a human being. The character’s growing language capabilities—his capacity to communicate fluently with his human family—proved crucial in his quest. But long before Williams donned his robot suit, people were dreaming about talking with machines naturally, conversing with them as they would with another person.

Earlier this year, Microsoft Korea hosted a roundtable on “Research on Signal Processing and Speech,” describing recent work on human-machine natural language communication. The research, a collaborative effort between Yonsei University and Microsoft Research, was led by Professor Hong-Goo Kang of the Yonsei’s School of Electrical and Electronic Engineering. Its ultimate goal is to make natural conversation between humans and machines possible.

The roundtable on “Research on Signal Processing and Speech” (bottom row: Professor Hong-goo Kan and Miran Lee)
(opens in new tab)
The roundtable highlighted signal processing and speech research using DNNs.
(pictured in bottom row: Professor Hong-goo Kang, Yonsei University [left], and Miran Lee, Microsoft Research [right])

MICROSOFT RESEARCH PODCAST

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

This episode features Senior Principal Research Manager Ahmed H. Awadallah, whose work improving the efficiency of large-scale AI models and efforts to help move advancements in the space from research to practice have put him at the forefront of this new era of AI.

The team focused on voice synthesis and text-to-speech (TTS) conversion, two elements crucial in achieving fluent, natural sounding machine speech. The mechanical, depersonalized voice of machines had been a limitation of previous TTS technologies, according to Kang, which is why the team focused on TTS technology based on deep neural networks (DNNs). DNNs attempt to replicate the neural network of the human brain, particularly the way neurons communicate with one another. By so doing, DNN facilitates a sophisticated type of machine learning that researchers call deep learning. Deep learning should allow machines to understand human speech and respond more relevantly and with a more natural sounding voice.

“The copyright of the research result belongs to me, but other IT companies and everyone else can share it,” said Kang. “It’s hard to conduct this kind of long-term project with just the resources in academia. Therefore, we must work with companies, which is why collaboration with Microsoft Research was so meaningful.” Microsoft Research also offered an internship to one of Kang’s students, who subsequently published his research and presented it at an international conference.

This collaboration is indicative of our commitment to create an ecosystem that connects companies and academic institutions, and our ongoing efforts to foster talented young computer-science researchers.

—Miran Lee, Principal Research Program Manager, Microsoft Research

Learn more