Trace Id is missing

Maximize the everyday with Microsoft 365

Get online protection, secure cloud storage, and innovative apps designed to fit your needs—all in one plan.
A young woman works on a Surface laptop while Microsoft 365 App icons whirl around her head.

Program for TPC-H Data Generation with Skew

The schema and queries of the TPC-H (formerly TPC-D) benchmark are widely used by people in the database community. Last published: April 26, 2016.

Important! Selecting a language below will dynamically change the complete page content to that language.

Download
  • Version:

    1.1

    Date Published:

    5/12/2016

    File Name:

    TPCDSkew.zip

    File Size:

    246.0 KB

    The schema and queries of the TPC-H (formerly TPC-D) benchmark are widely used by people in the database community. One of the requirements of the benchmark is that data for columns in the database are generated from a uniform distribution. However, this requirement makes it hard for users to conclude about the robustness/effectiveness of their system since real world data distributions are often non-uniform. We have therefore created a new data generation program for TPC-H that is capable of generating a database where the columns have non-uniform (skewed) data distributions. In particular, the program can generate data from a Zipfian distribution, where the Zipf value (z), which controls the degree of skew in the data, is a parameter that can be specified to the program. In addition, the program allows the generation of a database with “mixed” data distribution where the skew of a column in the database is randomly chosen from the Zipfian values {0,1,2,3,4}. Note that the total number of rows in the tables and the total database size are not affected by our changes.
  • Supported Operating Systems

    Windows 10, Windows 7, Windows 8

    • Windows 7, Windows 8, or Windows 10
    • Click Download and follow the instructions.

Follow Microsoft