Posted by
Sign language is the primary language for many deaf and hard-of-hearing people. But it currently is not possible for these people to interact with computers using their native language.
Because of this, researchers in recent years have spent lots of time studying the challenges of sign-language recognition, because not everyone understands sign language, and human sign-language interpreters are not always available. The researchers have examined the potential of input sensors such as data gloves or special cameras. The former provide good recognition results but are inconvenient to wear and have proven too expensive for mass use. And web cameras struggle to cope with issues such as tricky real-world backgrounds or illumination when not under controlled conditions that enable accurate hand tracking.
Then along came a device called the Kinect. Researchers from Microsoft Research Asia have collaborated with colleagues from the Institute of Computing Technology at the Chinese Academy of Sciences (CAS) to explore how Kinect’s body-tracking abilities can be applied to the problem of sign-language recognition. Results have been encouraging in enabling people whose primary language is sign language to interact more naturally with their computers, in much the same way that speech recognition does.
“From our point of view,” says CAS Professor Xilin Chen, “the most significant contribution is that the project demonstrates the possibility of sign-language recognition with readily available, low-cost 3-D and 2-D sensors.”
The work, facilitated and supported by Microsoft Research Connections, is summarized in the paper Ming Zhou, principal researcher at Microsoft Research Asia.
Kinect, with its ability to provide depth information and color data simultaneously, makes it easier to track hand and body actions more accurately—and quickly.
In this project—which is being shown during the DemoFest portion of Faculty Summit 2013, which brings more than 400 academic researchers to Microsoft headquarters to share insight into impactful research—the hand tracking leads to a process of 3-D motion-trajectory alignment and matching for individual words in sign language. The words are generated via hand tracking by the Kinect for Windows software and then normalized, and matching scores are computed to identify the most relevant candidates when a signed word is analyzed.
The algorithm for this 3-D trajectory matching, in turn, has enabled the construction of a system for sign-language recognition and translation, consisting of two modes. The first, Translation Mode, translates sign language into text or speech. The technology currently supports American sign language but has potential for all varieties of sign language.
The second, Communications Mode, enables communications between a hearing person and a deaf or hard-of-hearing person by use of an avatar. Guided by text input from a keyboard, the avatar can display the corresponding sign-language sentence. The deaf or hard-of-hearing person responds using sign language, and the system converts that answer into text.
Does it work? Surprisingly well.
“One unique contribution of this project is that it is a joint effort between software researchers and the deaf and hard of hearing,” Zhou says. “A group of teachers and students from Beijing Union University joined this project, and this enabled our algorithms to be conducted on real-world data.”
Indeed, the collaboration between Microsoft Research and academia was central to the project.
“We have been thrilled to see Kinect adopted by so many researchers since its launch,” says Stewart Tansley, a director for Microsoft Research Connections who is responsible for Microsoft’s academic research partnerships related to natural user interfaces. ”This project exemplifies the strong collaborative bonds between academia and Microsoft Research, as well as the continuing potential for technology to improve lives across languages and cultures around the world, ultimately bringing us all closer together.”
And while the research is valuable in the realm of visual information processing, it also is intended to provide practical assistance to people who communicate primarily in sign language.
“We believe that IT should be used to improve daily life for all persons,” says Guobin Wu, a research program manager from Microsoft Research Asia. “While it is still a research project, we ultimately hope this work can provide a daily interaction tool to bridge the gap between the hearing and the deaf and hard of hearing in the near future.”