Consistent Histograms With Distinct Value Counts

Dan Suciu; Raghav Kaushik

Consistent Histograms With Distinct Value Counts

Dan Suciu ,
Raghav Kaushik

International Conference on Very Large Data Bases | August 2009

Published by Very Large Data Bases Endowment Inc.

Download BibTex

Self-tuning histograms have been proposed in the past as an attempt to leverage feedback from query execution. However, the focus thus far has been on histograms that only store cardinalities. In this paper, we study consistent histogram construction from query feedback that also takes distinct value counts into account. We ﬁrst show how the entropy maximization (EM) principle can be leveraged to identify a distribution that approximates the data given the execution feedback making the least additional assumptions. This EM model that takes both distinct value counts and cardinalities into account. However, we ﬁnd that it is computationally prohibitively expensive. We thus consider an alternative formulation for consistency – for a given query workload, the goal is to minimize the L2 distance between the true and estimated cardinalities. This approach also handles both cardinalities and distinct values counts. We propose an efﬁcient one-pass algorithm with several theoretical properties modeling this formulation. Our experiments show that this approach produces similar improvements in accuracy as the EM based approach while being computationally signiﬁcantly more efﬁcient.

All articles published in this journal are protected by copyright, which covers the exclusive rights to reproduce and distribute the article (e.g., as offprints), as well as all translation rights. No material published in this journal may be reproduced photographically or stored on microfilm, in electronic data bases, video disks, etc., without first obtaining written permission from Very Large Data Bases Endowment Inc.