Transforming raw data into meaningful visuals, such as charts, is key to uncovering hidden trends and valuable insights, but even with advances in AI-powered tools, this process remains complex. Integrating AI into the iterative nature of the data visualization process is particularly challenging, as data analysts often struggle to describe complicated tasks in a single text prompt while lacking the direct control of traditional tools. This highlights the need for smarter, more intuitive solutions that combine AI’s precision with the flexibility of hands-on methods.
To address this, we’re excited to release Data Formulator as an open-source research project. This update builds on last year’s release by combining user interface (UI) interactions for designing charts with natural language input for refining details. Unlike the previous version, which required users to choose between two methods, this unified approach allows them to iteratively solve complex tasks and with less effort.
- Download Data Formulator
Creating and refining charts with the Concept Encoding Shelf and data threads
With Data Formulator, data analysts can now create charts from scratch or select from existing designs through data threads. The UI features a pane called the “Concept Encoding Shelf,” where users can build their chart by dragging various data fields into it and defining them or by creating new ones. A large language model (LLM) on the backend processes this input, generating the necessary code to produce the visual and updating the data threads for future use. This process is illustrated in Figure 2.
Data threads enable users to review and modify charts they created previously. This iterative process streamlines the editing and refinement process, as the LLM adapts past code to new contexts. Without this feature, users would need to provide more detailed prompts to recreate designs from scratch. This iterative mechanism also allows users to continue updating their charts until they’re satisfied.
Data Formulator’s framework
Data Formulator’s architecture separates data transformation from chart configuration, improving both the user experience and AI performance. Upon receiving user specifications, the system follows a three-step process: (1) it generates a Vega-Lite script, which defines how data is visualized; (2) it instructs the AI to handle data transformation; and (3) it creates the chart using the converted data, as illustrated in Figure 4.
Implications and looking forward
Refining how users interact with AI-powered tools is essential for improving how they communicate their requirements, paving the way for more efficient and effective collaboration. By integrating UI elements and natural language input, we designed Data Formulator to let users to define their visualization needs with precision, leading to better results and reducing the need for multiple clarifications.
While Data Formulator addresses some challenges in data transformation and visualization authoring, others remain. For example, how can AI assist in cleaning unstructured data without losing critical information? And how can it help users define clear data analysis goals when starting with ambiguous or undefined objectives? We’re actively investigating these research questions and invite you to contribute by building on the Data Formulator codebase (opens in new tab).
Learn more about our research efforts on human-AI interaction by exploring how we design dynamic UI widgets (opens in new tab) for visualization editing. You can also view a demo of the Data Formulator project on GitHub Codespace (opens in new tab).
Acknowledgements
We’d like to thank Bongshin Lee, John Thompson, and Gonzalo Ramos for their feedback and contributions to this project.