Query-biased summaries for tabular data

Proceedings of the Australasian Document Computing Symposium |

Published by ACM

Publication | Publication

Government, research, and academic data portals publish a large amount of public data, but present tools make discovery difficult. In particular, search results do not support a user’s decision whether or not to commit to a download of what might be a large data set.

We describe a method for producing query-biased summaries of tabular data, which aims to support a user’s download decision—or even to answer the question on the spot, with no further interaction. The method infers simple types in the data and query; automatically refines queries, where that makes sense; extracts relevant subsets of the complete table; and generates both graphical and tabular summaries of what remains. A small-scale user study suggests this both helps users identify useful results (fewer false negatives), and reduces wasted downloads (fewer false positives).