FigureQA Dataset

Highlights

100,000
Figure images in the training set

1,327,368
Question-answer pairs in the training set

100
Unique colors and possible names for figure plot elements

15
Question types for quantitative attributes

Details

Dataset Split # Images # Questions Has Answers & Annotations? Color Scheme
Train 100,000 1,327,368 Yes Scheme 1
Validation 1 20,000 265,106 Yes Scheme 1
Validation 2 20,000 265,798 Yes Scheme 2
Test 1 20,000 265,024 No Scheme 1
Test 2 20,000 265,402 No Scheme 2

 

Unique Features

Additionally, the following features make FigureQA a distinct visual question-answering (VQA) and reasoning dataset:

  • It is entirely synthetically generated. Any number of samples can be generated in a configurable and extensible manner.
  • Each figure image is accompanied by the source data used to create it. This data can be used as input features or a learning target, and can be used to formulate questions and answers.
  • Rich bounding box annotations for all plot elements are extracted automatically and included with each generated figure image.

Figure Color Schemes

To color and identify plot elements, 100 colors where selected from the X11 named color set. Colors were selected to have a large color distance from white, the background color, with some modifications to the names to enhance readability.

In order to evaluate models on unseen color combinations, we provide validation and test sets with two color schemes consisting of alternating disjoint color sets. Each figure is colored with one set according to the training color scheme, then the other color set in the test set using the test color scheme. This ensures that all colors are learned during training, and is consistent with the one used in the CLEVR dataset.

For example:

Scheme 1

  • Vertical bar graphs, line charts, and pie charts are colored using 50 unique colors in set A, including crimson, seafoam, and royal blue.
  • Horizontal bar graphs and dot line charts are colored using 50 unique colors in set B, including light coral, sienna, and web purple.

Scheme 2

  • Vertical bar graphs, line charts, and pie charts are colored using 50 unique colors in set B, including light coral, sienna, and web purple.
  • Horizontal bar graphs and dot line charts are colored using 50 unique colors in set A, including crimson, seafoam, and royal blue.

People

Portrait of Samira Ebrahimi Kahou

Samira Ebrahimi Kahou

Postdoctoral Researcher

McGill University, Mila

Portrait of Vincent Michalski

Vincent Michalski

Research Intern

MILA

Portrait of Adam Atkinson

Adam Atkinson

Software Developer

Portrait of Akos Kadar

Akos Kadar

Research Intern

Portrait of Yoshua Bengio

Yoshua Bengio

Founder and Scientific Director

Mila – Quebec AI Institute

Portrait of Mahmoud Adada

Mahmoud Adada

Principal Engineering Manager

Portrait of Rahul Mehrotra

Rahul Mehrotra

Senior Program Manager