In this project, we try to research and develop a conversation technology for data analytics scenarios. By using our technology, given a relational database or a data table, a user can explore the data table and insights from the dataset through natural language conversation. Our system can understand user’s natural language questions and convert the questions into some analysis programs. The programs can be executed on the relational database (or the data table) to obtain their results. The results are then returned to users in a form of visualization and a natural language description. With conversational analytics, we can significantly lower the bar for general users to conduct data analysis. It helps organizations analyze structural data (e.g., tabular datasets, relational databases, knowledgebase, etc.) and deliver personalized reporting and intelligence.
In this project, we research and develop AI&NLP technologies and experience designs to support conversational analytics. This is a cross-discipline research project that includes the following research topics:
- Semantic parsing (also called NL2Code) is a technology of converting a natural language utterance to a logical form (or program) that is a machine-understandable meaning representation of the utterance. Different applications may need different mean representations. Here, a domain specific mean representation (i.e., DSL) can be SQL, a kind of Formula, regular expression, or Lambda calculus, or other logic form. In this project, we proposed a unified framework on semantic parsing to support different DSLs for different scenarios. Many problems in natural language understanding domain are also important to a practical semantic parsing system, such as conversation context modeling, ambiguity handling, and so on.
- Knowledge computing. In practice, common knowledge or domain specific knowledge are often essential to understand a natural language utterance in an application scenario. Therefore, research on technologies of knowledge representation, knowledge mining/enhancement, and knowledge integration is also very important.
- Advanced machine learning approaches. Machine learning algorithms play a critical role in our semantic parsing and knowledge computing technologies. To develop advanced NLU technologies, we often need advanced machine learning algorithms. Recently, deep learning algorithms have aided conversational analytics to support multi-step complex queries by continually learning through experience from large natural language datasets.
- Experience design. User interaction design is a very important factor to the success of a conversational data analysis system. As we know, NL is often ambiguous, and SOTA NL technologies are often not 100% accurate, a good interaction design can mitigate these problems. In addition, traditional visual interface is also very convenient and efficient in some scenarios. We need a good interaction design to seamlessly integrate the NL interface with the traditional visual interface together for better user experience.
Project News:
- Our NL technology on Excel was announced at Ignite 2019 (opens in new tab). Here (opens in new tab) is a review article from Mr. Excel (a MVP user of Excel).
- Our NL technology on Excel has been released (opens in new tab).
People
Jian-Guang Lou
Sr. Principal Research Manager
Zhitao Hou
SR PRINCIPAL RESEARCHER
Haidong Zhang
Principal Architect
Yan Gao
Researcher
Zeqi Lin
Senior Researcher
Yan XIAO
RSDE
MSRA
Dongmei Zhang
Distinguished Scientist, Deputy Managing Director, Microsoft Research Asia