Where’s my stuff? Developing AI with help from people who are blind or low vision to meet their needs

Published May 12, 2020

By Simone Stumpf , Senior Lecturer, City, University of London Cecily Morrison , Senior Principal Research Manager Daniela Massiceti , Senior Researcher Ed Cutrell , Sr. Principal Research Manager Lida Theodorou , Research Fellow, City, University of London

Share this page

A photo of 10 images taken from a pilot study by people who are blind or low vision. Microsoft AI for Accessibility is funding the ORBIT research project, which is enlisting the help of people who are blind or low vision to build a new dataset. People who are blind or low vision can contribute to the project by providing videos of things found in their daily lives. The goal is to improve automatic object recognition to better identify specific personal items. The data will be used for training and testing Artificial Intelligence (AI) models that personalize object recognition. In contrast to previous research efforts, we will request videos rather than images from users who are blind or low vision, as they provide a richer set of information.

To help us with our research, we have conducted a pilot study to further investigate how to collect these videos, and we are currently recruiting users who are blind or low vision in the UK to record videos of things that are important to them. Visit the ORBIT dataset homepage (opens in new tab) for more information on the study and how to sign up.

To maintain privacy and confidentiality, users’ contributions to the pilot study, and to all future phases of this research, are anonymized and checked before being included in the dataset. Any videos containing information that could lead back to the identity of the users are removed.

Smartphones are really useful in making visual information accessible to people who are blind or low vision. For instance, the Seeing AI app allows you to take a picture of your surroundings in scene mode, and then it reads aloud what things are recognized in the picture (for example, “a person sitting on a sofa”). AI recognizes objects in a scene easily enough if they are commonly found. For now, however, these apps can’t tell you which of the things it recognizes is yours, and they don’t know about things that are particularly important to users who are blind or low vision. For example, has someone moved your keys again? Did your white cane get mixed up with someone else’s? Imagine being able to easily identify things that are important to you or being able to easily locate your personal stuff.

Apps like Seeing AI use artificial intelligence techniques in computer vision to recognize items. While AI is making great strides toward improving computer vision solutions for many applications, such as automated driving, there are still areas where it does not work so well—personalized object recognition is one such area. Previous research has started to make some advances to solving the problem by looking at how people who are blind or low vision take pictures, what algorithms could be used to personalize object recognition, and which kinds of data are best suited for enabling personalized object recognition.

However, research is currently held back by the lack of available data to use for training and then evaluating AI algorithms for personalized object recognition. Most datasets in computer vision comprise hundreds of thousands or millions of images. But at the moment, the datasets available for personal object recognition are from tens of users and contain maybe hundreds of images. In addition, there has been no effort to collect images of objects that may be particularly important to users who are blind or low vision. Providing a larger dataset for researchers and developers to use to build better AI systems could be a game changer in this area, for people who are blind or low vision in particular but also for everyone.

By funding the ORBIT project, Microsoft AI for Accessibility hopes to help researchers construct a large dataset from users who are blind or low vision, which will help further advance AI as it relates to personalizing object recognition. Researchers from City, University of London (opens in new tab), Microsoft Research, and University of Oxford are collaborating in this effort. Collaborators include the authors of this blog post along with Toby Harris, Mobile App Developer at City, University of London, Katja Hofmann, Principal Researcher at Microsoft Research Cambridge, Luisa Zintgraf (opens in new tab), PhD student at the University of Oxford and Research Intern at Microsoft Research Cambridge.

Unlike previous research efforts, we will collect videos since they provide a richer set of information than images. Our research is also focused on providing realistic testing data so that any new algorithms can be rigorously evaluated. We anticipate that our dataset might be useful for implementations in existing apps, like Seeing AI, and also in novel wearable systems like Project Tokyo, but our team is keen to future-proof the dataset for new applications that are yet to be imagined. The dataset will be made publicly available for download in two phases: Phase 1 will include about 100 users and thousands of videos, while Phase 2 will gather data from about 1,000 users and contain more than 10,000 videos.

Figure 1: A word cloud showing the frequency of objects captured on video in the pilot study. The larger the word, the more often it occurred.

To help us with our research, we have conducted a pilot study to investigate how to collect these videos. During that pilot study, we gathered 193 videos from eight people who are blind or low vision, each of whom were asked to take videos of five different objects. For each object, we have videos in different settings in their home and using different filming techniques, some of which they were free to choose. We found that common items that were captured on video included users’ own white canes, keys, glasses, remote controls, bags, and headphones, many of which users wanted to identify and locate. The videos also included some rather surprising items, such as crisps (or chips to Americans). The quality of videos from the pilot study was good for machine learning purposes, revealing no serious issues due to lighting or items being out of frame.

This pilot dataset is available for public use while we develop the larger datasets. Initial investigations by researchers from Microsoft Research have shown that a convolutional neural network trained on Mini-ImageNet (opens in new tab) using 50 random frames drawn from the videos can reach accuracy of nearly 50%. A more sophisticated technique, which applies a frame-weighting mechanism supplemented with training on Meta-Dataset (opens in new tab), can improve results by 10 percentage points.

The pilot also highlighted that collecting videos is tricky because it must be simultaneously easy for blind users to record them, and the data must be useful for machine learning. Instructions for users who are blind or low vision on how to take videos of objects are important—so that the AI can eventually identify the objects. This aspect presents an additional opportunity for users to learn more about how AI works. We’re currently at the beginning stages of developing a curriculum for users who want to educate themselves about AI, from a base-level understanding to some more advanced concepts like how AI is built.

The next step in our project is to gather video data for the first phase of our dataset release. To do this, we are recruiting users who are blind or low vision in the UK to record videos of things that are important to them. We have built an iPhone app, the ORBIT Camera, for gathering the videos, including guidance to users for how to film the things that they want to have recognized. For anyone interested in taking part, please visit the ORBIT dataset homepage (opens in new tab) for more information on the study and how to sign up. Later in the year, there will be another opportunity to contribute to our project for users who are blind or low vision worldwide.