SparkCruise: handsfree computation reuse in Spark

Very Large Data Bases |

Published by VLDB Endowment

PDF | Publication

Interactive data analytics is often inundated with common computations across multiple queries. These redundancies result in poor query performance and higher overall cost for the interactive query sessions. Obviously, reusing these common computations could lead to cost savings. However, it is difficult for the users to manually detect and reuse the common computations in their fast moving interactive sessions. In the paper, we propose to demonstrate SparkCruise, a computation reuse system that automatically selects the most useful common computations to materialize based on the past query workload. SparkCruise materializes these computations as part of query processing, so the users can continue with their query processing just as before and computation reuse is automatically applied in the background — all without any modifications to the Spark code. We will invite the audience to play with several scenarios, such as workload redundancy insights and pay-as-you-go materialization, highlighting the utility of SparkCruise.