Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images

  • Haley MacLeod ,
  • Cynthia L. Bennett ,
  • Meredith Ringel Morris ,

Proceedings of CHI 2017 |

Published by ACM

Research advancements allow computational systems to automatically caption social media images. Often, these captions are evaluated with sighted humans using the image as a reference. Here, we explore how blind and visually impaired people experience these captions in two studies about social media images. Using a contextual inquiry approach (n=6 blind/visually impaired), we found that blind people place a lot of trust in automatically generated captions, filling in details to resolve differences between an image’s context and an incongruent caption. We built on this in-person study with a second, larger online experiment (n=100 blind/visually impaired) to investigate the role of phrasing in encouraging trust or skepticism in captions. We found that captions emphasizing the probability of error, rather than correctness, encouraged people to attribute incongruence to an incorrect caption, rather than missing details. Where existing research has focused on encouraging trust in intelligent systems, we conclude by challenging this assumption and consider the benefits of encouraging appropriate skepticism.

Combining Human and Machine Intelligence to Describe Images to People with Vision Impairments

This talk was presented as part of the CVPR 2020 VizWiz Grand Challenge Workshop. More information about the workshop can be found at https://vizwiz.org/workshops/2020-workshop/.