Anticipating More from Cortana

Published

Most of us can only dream of having the perfect personal assistant, one who is always there when needed, anticipating our every request and unobtrusively organizing our lives. Cortana, the new digital personal assistant powered by Bing that comes with Windows Phone 8.1, brings users closer to that dream.

For Larry Heck, a distinguished engineer in Microsoft Research, this first release offers a taste of what he has in mind. Over time, Heck wants Cortana to interact in an increasingly anticipatory, natural manner.

Cortana already offers some of this behavior. Rather than just performing voice-activated commands, Cortana continually learns about its user and becomes increasingly personalized, with the goal of proactively carrying out the right tasks at the right time. If its user asks about outside temperatures every afternoon before leaving the office, Cortana will learn to offer that information without being asked.

Microsoft Research Blog

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.

Furthermore, if given permission to access phone data, Cortana can read calendars, contacts, and email to improve its knowledge of context and connections. Heck, who plays classical trumpet in a local orchestra, might receive a calendar update about a change in rehearsal time. Cortana would let him know about the change and alert him if the new time conflicts with another appointment.

Research Depth and Breadth an Advantage

While many people would categorize such logical associations and humanlike behaviors under the term “artificial intelligence” (AI), Heck points to the diversity of research areas that have contributed to Cortana’s underlying technologies. He views Cortana as a specific expression of Microsoft Research’s work on different areas of personal-assistant technology.

“The base technologies for a virtual personal assistant include speech recognition, semantic/natural language processing, dialogue modeling between human and machines, and spoken-language generation,” he says. “Each area has in it a number of research problems that Microsoft Research has addressed over the years. In fact, we’ve pioneered efforts in each of those areas.”

Cortana user interface

The Cortana user interface.

Cortana’s design philosophy is therefore entrenched in state-of-the-art machine-learning and data-mining algorithms. Furthermore both developers and researchers are able to use Microsoft’s broad assets across commercial and enterprise products, including strong ties to Bing web search and Microsoft speech algorithms and data.

If Heck has set the bar high for Cortana’s future, it’s because of the deep, varied expertise within Microsoft Research.

“Microsoft Research has a long and broad history in AI,” he says. “There are leading scientists and pioneers in the AI field who work here. The underlying vision for this work and where it can go was derived from Eric Horvitz’s work on conversational interactions and understanding, which go as far back as the early ’90s. Speech and natural language processing are research areas of long standing, and so is machine learning. Plus, Microsoft Research is a leader in deep-learning and deep-neural-network research.”

From Foundational Technology to Overall Experience

In 2009, Heck started what was then called the conversational-understanding (CU) personal-assistant effort at Microsoft.

“I was in the Bing research-and-development team reporting to Satya Nadella,” Heck says, “working on a technology vision for virtual personal assistants. Steve Ballmer had recently tapped Zig Serafin to unify Microsoft’s various speech efforts across the company, and Zig reached out to me to join the team as chief scientist. In this role and working with Zig, we began to detail out a plan to build what is now called Cortana.”

Cortana product researchers

Researchers who worked on the Cortana product (from left): top row, Malcolm Slaney, Lisa Stifelman, and Larry Heck; bottom row, Gokhan Tur, Dilek Hakkani-Tür, and Andreas Stolcke.

Heck and Serafin established the vision, mission, and long-range plan for Microsoft’s digital-personal-assistant technology, based on scaling conversations to the breadth of the web, and they built a team with the expertise to create the initial prototypes for Cortana. As the effort got off the ground, Heck’s team hired and trained several Ph.D.-level engineers for the product team to develop the work.

“Because the combination of search and speech skills is unique,” Heck says, “we needed to make sure that Microsoft had the right people with the right combination of skills to deliver, and we hired the best to do it.”

After the team was in place, Heck and his colleagues joined Microsoft Research to continue to think long-term, working on next-generation personal-assistant technology.

Some of the key researchers in these early efforts included Microsoft Research senior researchers Dilek Hakkani-Tür and Gokhan Tur, and principal researcher Andreas Stolcke. Other early members of Heck’s team included principal research software developer Madhu Chinthakunta, and principal user-experience designer Lisa Stifelman.

“We started out working on the low-level, foundational technology,” Heck recalls. “Then, near the end of the project, our team was doing high-level, all-encompassing usability studies that provided guidance to the product group. It was kind of like climbing up to the crow’s nest of a ship to look over the entire experience.

“Research manager Geoff Zweig led usability studies in Microsoft Research. He brought people in, had them try out the prototype, and just let them go at it. Then we would learn from that. Microsoft Research was in a good position to study usability, because we understood the base technology as well as the long-term vision and how things should work.”

The Long-Term View

Heck has been integral to Cortana since its inception, but even before coming to Microsoft in 2009, he already had contributed to early research on CU personal assistants. While at SRI International in the 1990s, his tenure included some of the earliest work on deep-learning and deep-neural-network technology.

Heck was also part of an SRI team whose efforts laid the groundwork for the CALO AI project funded by the U.S. government’s Defense Advanced Research Projects Agency. The project aimed to build a new generation of cognitive assistants that could learn from experience and reason intelligently under ambiguous circumstances. Later roles at Nuance Communications and Yahoo! added expertise in research areas vital to contributing to making Cortana robust.

notebook menu for Cortana

The notebook menu for Cortana.

Not surprisingly, Heck’s perspectives extend to a distant horizon.

“I believe the personal-assistant technology that’s out there right now is comparable to the early days of search,” he says, “in the sense that we still need to grow the breadth of domains that digital personal assistants can cover. In the mid-’90s, before search, there was the Yahoo! directory. It organized information, it was popular, but as the web grew, the directory model became unwieldy. That’s where search came in, and now you can search for anything that’s on the web.”

He sees personal-assistant technology traveling along a similar trajectory. Current implementations target the most common functions, such as reminders and calendars, but as technology matures, the personal assistant has to extend to other domains so that users can get any information and conduct any transaction anytime and anywhere.

“Microsoft has intentionally built Cortana to scale out to all the different domains,” Heck says. “Having a long-term vision means we have a long-term architecture. The goal is to support all types of human interaction—whether it’s speech, text, or gestures—across domains of information and function and make it as easy as a natural conversation.”

Continue reading

See all blog posts