Consolidation of Queries with User-defined Functions

Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation |

Published by ACM

Motivated by streaming and data analytics scenarios where many queries operate on the same data and perform similar computations, we propose program consolidation for merging multiple user-defined functions (UDFs) that operate on the same input. Program consolidation exploits common computations between UDFs to generate an equivalent optimized function whose execution cost is often much smaller (and never greater) than the sum of the costs of executing each function individually. We present a sound consolidation calculus and an effective algorithm for consolidating multiple UDFs. Our approach is purely static and uses symbolic SMT-based techniques to identify shared or redundant computations. We have implemented the proposed technique on top of the Naiad data processing system. Our experiments show that our algorithm dramatically improves overall job completion time when executing user-defined filters that operate on the same data and perform similar computations.