Identifying Earmarks in Congressional Bills

  • Ellery Wulczyn ,
  • Madian Khabsa ,
  • Vrushank Vora ,
  • Matthew Heston ,
  • Joe Walsh ,
  • Christopher Berry ,
  • Rayid Ghani

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016) |

Published by ACM

Publication

Earmarks are legislative provisions that direct federal funds to specific projects, circumventing the competitive grant-making process of federal agencies. Identifying and cataloging earmarks is a tedious, time-consuming process carried out by experts from public interest groups. In this paper, we present a machine learning system for automatically extracting earmarks from congressional bills and reports. We first describe a table-parsing algorithm for extracting budget allocations from appropriations tables in congressional bills. We then use machine learning classifiers to identify budget allocations as earmarked objects with an out of sample ROC AUC score of 0.89. Using this system, we construct the first publicly available database of earmarks dating back to 1995. Our machine learning approach adds transparency, accuracy, and speed to the congressional appropriations process.