Optimizing Big-Data Queries using Program Reasoning

Established: May 1, 2016

This project is at the intersection of programming languages and database systems. The goal of the project is to use programming languages techniques to analyze and optimize big-data queries.

We show how program synthesis can be used to discover optimizations that big-data query optimizers miss today. A big-data query optimizer produces an executable plan composed of map-reduce stages. We use program synthesis to produce plans with fewer stages than a query optimizer. A query optimizer has rules to transform a tree of operators, today this set is limited to be SQL operators. Our synthesis based technique shows that this is insufficient. In subsequent work we extend the optimizer with new operators and new rules that target these operators. Several components of this work are incorporated in the Spark engine of Azure Synapse and are available for use in production.

We are also building a new compiler that can generate efficient machine code for SQL queries. The compiler is powered by a new domain specific intermediate representation for SQL. We apply compiler optimizations in the IR and generate low level code that can target CPUs today and will target domain specific accelerators in future

People

Kaushik Rajan

Principal Researcher

Learn more

Akash Lal

Senior Principal Researcher

Learn more

Aseem Rastogi

Principal Researcher

Learn more