Landfill: an open dataset of code smells with public evaluation

  • Fabio Palomba ,
  • Dario Di Nucci ,
  • ,
  • Gabriele Bavota ,
  • Rocco Oliveto ,
  • Denys Poshyvanyk ,
  • Andrea De Lucia

2015 Mining Software Repositories |

Published by IEEE

Publication | Publication | Publication | Publication

Code smells are symptoms of poor design and implementation choices that may hinder code comprehension and possibly increase change- and fault-proneness of source code. Several techniques have been proposed in the literature for detecting code smells. These techniques are generally evaluated by comparing their accuracy on a set of detected candidate code smells against a manually-produced oracle. Unfortunately, such comprehensive sets of annotated code smells are not available in the literature with only few exceptions. In this paper we contribute (i) a dataset of 243 instances of five types of code smells identified from 20 open source software projects, (ii) a systematic procedure for validating code smell datasets, (iii) L andfill , a Web-based platform for sharing code smell datasets, and (iv) a set of APIs for programmatically accessing L andfill ‘s contents. Anyone can contribute to Landfill by (i) improving existing datasets (e.g., adding missing instances of code smells, flagging possibly incorrectly classified instances), and (ii) sharing and posting new datasets. Landfill is available at www.sesa.unisa.it/landfill/, while the video demonstrating its features in action is available at http://www.sesa.unisa.it/tools/landfill.jsp.