PolySem: Efficient Polyglot Analytics on Semantic Data

Poly'23: Polystore systems for heterogeneous data in multiple databases with privacy and security assurances |

Data scientists and data engineers spend a significantly large portion of their time trying to understand, clean and transform their data before they can even start performing any  meaningful analysis. Most database vendors provide business intelligence (BI) tools as an efficient and user friendly platform for customers to perform their data cleaning, preparation  and linking tasks to obtain actionable semantic data. However, customers are increasingly interested in querying semantic data through various query modalities including SQL,  imperative programming languages such as Python, and natural language queries. Currently, customers are limited to using either the visual interfaces provided by these tools or languages that are specific to the particular tool. In this proposal, we describe techniques to enable the execution of user queries expressed in different modalities on semantic datasets without having to export data out of the BI system. Our techniques comprise of automatic translation of user queries into a language-agnostic representation of data processing  operations, and subsequently to the specific query language that is amenable to execution on the BI engine. Our evaluation results on business intelligence and decision support  benchmarks suggest significant improvements in query performance compared to other popular data processing engines.