CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

  • Shuai Lu ,
  • Daya Guo ,
  • Shuo Ren ,
  • Junjie Huang ,
  • Alexey Svyatkovskiy ,
  • Ambrosio Blanco ,
  • Colin Clement ,
  • Dawn Drain ,
  • Daxin Jiang (姜大昕) ,
  • Duyu Tang ,
  • Ge Li ,
  • Lidong Zhou ,
  • Linjun Shou ,
  • Long Zhou ,
  • ,
  • ,
  • Ming Zhou ,
  • Nan Duan ,
  • Neel Sundaresan ,
  • Shao Kun Deng ,
  • ,

arXiv

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

Publication Downloads

CodeXGLUE

September 28, 2020

CodeXGLUE is a benchmark dataset and open challenge for code intelligence. It includes a collection of code intelligence tasks and a platform for model evaluation and comparison. CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified code intelligence tasks covering these scenarios including code-code, text-code, code-text and text-text.