Sketch-Based Multi-Query Processing over Data Streams

in Data-Centric Systems and Applications book series (DCSA)

2016

We consider the problem of approximately answering multiple general aggregate SQL queries over continuous data streams with limited memory. Our method extends the randomizing techniques of Alon et al. that compute small “sketch” summaries of the streams that can then be used to provide approximate answers to aggregate queries with provable guarantees on the approximation error. By intelligently sharing the sketches among multiple queries, the memory required can be reduced. We provide necessary and sufficient conditions for the sketch sharing to result in correct estimation and address optimization problems that arise. We also demonstrate how existing statistical information on the base data (e.g., histograms) can be used in the proposed framework to improve the quality of the approximation provided by our algorithms. The key idea is to intelligently partition the domain of the underlying attribute(s) and, thus, decompose the sketching problem in a way that provably tightens our guarantees.