Exploring Collection of Sign Language Videos through Crowdsourcing

  • ,
  • Abraham Glasser ,
  • Fyodor Minakov ,
  • Naomi Caselli ,
  • William Thies

2022 Conference on Computer Supported Cooperative Work |

Published by ACM

Inadequate sign language data currently impedes advancement of sign language ML and AI. Training on existing datasets results in limited models due to small size, and lack of diverse signers in real-world settings. Complex labeling problems in particular often limit scale. In this work, we explore the potential for crowdsourcing to help overcome these barriers. To do this, we ran a user study with exploratory crowdsourcing tasks designed to support scalability: 1) to record videos of specific content – thereby enabling automatic, scalable labeling – and 2) to perform quality control checks for execution consistency – further reducing post-processing requirements. We also provided workers with a searchable view of the crowdsourced dataset, to boost engagement and transparency and align with Deaf community values. Our user study included 29 participants using our exploratory tasks to record 1906 videos and perform 2331 quality control checks. Our results suggest that a crowd of signers may be able to generate high-quality recordings and perform reliable quality control, and that the signing community values visibility into the resulting dataset.