Speakeasy

Established: January 1, 2017

Knowledge bases, such as Bing and Google knowledge graphs, contain millions of entities (people, places, etc.) and billions of facts about them. While much is known about entities, little is known about the actions these entities relate to. On the other hand, the Web has lots of information about human tasks. A website for restaurant reservations, for example, implicitly knows about various restaurant-related actions (making reservations, delivering food, etc.), the inputs these actions require and their expected output; it can also be automated to execute those actions. To harvest action knowledge from websites, we have built Etna. Users demonstrate how to accomplish various tasks in a target website, and Etna constructs an action-state model of the website visualized as an action graph. An action graph includes definitions of tasks and actions, knowledge about their start/end states, and execution scripts for their automation.

Overview of the Etna process

The design of Etna was a long journey that started with an early prototype, Kite, designed for Android apps and a parallel exploration with Windows apps. Over the years we have used Etna to build various use cases, from web data scrapers to RPA (robotic process automation) agents. In particular, we have used action graphs to build zero-coding task-oriented chatbots which allow users to carry out tasks (e.g., ordering a cab) using natural language conversation, and to train a natural language interface (Flin) for navigating websites using voice.

To reduce the demonstration overhead we also explored how we could automatically “crawl” UI scripts for executing tasks in websites using a reinforcement learning agent (Glider).

People

Jason Kace

Senior Software Engineer

Learn more