By Janie Chang, Writer, Microsoft Research
From May 5 to 10, the Austin (Texas) Convention Center will be abuzz with workshops, demonstrations, and presentations as 2,500 attendees from more than 40 countries participate in the Association for Computing Machinery’s (ACM’s) 30th annual Conference on Human Factors in Computing Systems (CHI 2012). Microsoft Research is contributing 41 papers and five notes that delve into a wide variety of research areas, such as natural user interfaces, technologies for developing countries, social networking, health care, and search.
Nine of Microsoft Research’s papers and notes received honorable mention from the conference program committee, and Kevin Schofield, general manager and chief operations officer for Microsoft Research, will receive the SIGCHI Lifetime Service Award for contributing to the growth of the ACM’s Special Interest Group on Computer Human Interaction and for his influence on the community at large.
on-demand event
For Desney S. Tan, principal researcher at Microsoft Research Redmond and last year’s conference chair, an unofficial highlight is the fact that 94 percent of Microsoft Research’s papers and notes submitted to CHI 2012 were authored in collaboration with academic partners from universities and institutes around the globe.
“Microsoft Research aims to advance the state of the art and impact society in positive ways. In pursuing this mission, we constantly find ourselves engaging in open collaboration with the scientific and academic communities,” Tan says. “The CHI Conference is a great venue for researchers to share our progress toward blending our physical and virtual worlds, developing richer interaction modalities, and ultimately on delivering our vision of natural user interfaces.”
Tan also notes that a number of research projects being presented during CHI use affordable, commercially available technologies to integrate computers into everyday tasks. LightGuide is one such project.
Projecting Hints on the Human Body
Computer-aided instruction is nothing new; it has been a research focus for decades. Some approaches combine computer instruction with videos, while others opt for real-time feedback or augmented reality. LightGuide: Projected Visualizations for Hand Movement Guidance, by the Microsoft Research Redmond team of intern Rajinder Sodhi from the University of Illinois at Urbana–Champaign, Hrvoje Benko, and Andy Wilson, explores a new approach to guiding body movement by projecting visual hints directly onto the user’s body. The LightGuide proof-of-concept implementation restricts the experiment to hand movements and the projection of hints to the back of the user’s hand, but the impetus behind the paper began with the challenge of learning physical activities that require skilled instruction.
“Think of physiotherapy,” Sodhi says. “The physical therapist prescribes certain exercises and corrects the patient’s movements during the session. But then, the patient must go home and repeat those exercises, and if his positioning is wrong, his efforts could be ineffective. If we can track his movements and provide visual guidance, it’s like having a virtual physiotherapist. The patient will make better progress.”
The team had been interested in the combination of projection and motion-sensing technologies to use any surface—including the body—as a display-and-feedback mechanism. The main goal of LightGuide was to determine whether users were comfortable using projected visual cues on their hands to follow guided movements and how accurate those movements were compared with movements guided by conventional video instructions. The first challenge was to design a series of visual hints.
“We started out with perhaps 20 concepts,” Sodhi recalls, “and, after some initial testing, settled on four types of simple cues. Then we had to develop software to support those cues. We used a commercially available Kinect depth camera and a standard projector mounted to the ceiling. The camera tracked the user’s hand movement, and our algorithms adapted the visual cues in real time to project correctly and in perspective.”
The most important part of the work, however, came during and after the user study, when Sodhi and teammates were able to analyze the results. Exceeding expectations, the participants performed the simple hand movements nearly 85 percent more accurately than when guided through the same movements by video.
Sodhi emphasizes that while the next steps to expand the experiment include projection on other body parts and guiding wider ranges of motion, the team does not view its approach as a singular solution.
“We recognize that projected visual hints on the body are optimal for certain guided movements,” he said, “but alternative forms of computer-aided movement and video work better in other situations. We see this approach being used very effectively in combination with other technologies.”
Using Sound to See
Another Microsoft Research Redmond team has been working on is SoundWave: Using the Doppler Effect to Sense Gestures. The SoundWave project relies on hardware readily available on computers, laptops, and even mobile devices—the microphone and speaker—to sense motion and simple gestures.
The paper’s authors—intern Sidhant Gupta from the University of Washington, Dan Morris of Microsoft Research, Shwetak N. Patel of Microsoft Research and the University of Washington, and Tan—regard SoundWave as an example of the sort of serendipitous discovery that characterizes many great inventions.
Gupta and Patel, both of whom have collaborated on multiple projects within Microsoft Research, remember working on one that used ultrasonic sensors when something curious happened.
“I was sitting in my lab chair, measuring signals and kind of jiggling my leg,” Gupta recalls. “I saw the signal change when it should not have moved, so I thought there was a loose connection or something. But as soon as I got up from my chair to check, the error went away. I sat down to work, started moving my leg, and the signal changed again. After a couple of minutes of this, I realized it was detecting motion.”
One thing led to another, and after investigation, the team concluded that it was experiencing a well-understood phenomenon, the Doppler effect, which characterizes the frequency change of a sound wave as a listener moves toward or away from the source. A common example is the change in pitch of a vehicle siren as it approaches, passes, and then moves away from the listener.
SoundWave uses speakers to emit a continuous, inaudible tone that reflects off objects such as a moving hand or body. The microphone picks up the reflected signal, and software algorithms “detect” motion by interpreting changes in frequency. The team found that it could measure movement properties such as velocity, direction, proximity, the size of a moving object, and time variation—the rate of change of the other properties. This enabled the researchers to create a few simple hand gestures that could be used to control an application: scrolling down a page; single and double taps; and a two-handed, seesaw motion they used to play Tetris.
One of SoundWave’s goals was to have the software work over a range of computers—which means accommodating microphones and speakers of varying characteristics located in different places on different models. SoundWave therefore performs an automatic calibration to find the optimal tone frequency. This calibration step also helped SoundWave cope with noisy ambient conditions, such as when the system was tested in a café.
The team found that playing music from the same laptop has no impact on gesture recognition, because the frequencies seldom conflict. The researchers also discovered that controlling speaker volume regulates detection distance—an advantage in crowded situations in which the user wants to avoid picking up surrounding movement.
The SoundWave team believes its work represents a promising approach: a software-only solution for sensing in-air gestures that does not require specialized hardware. The team also notes that using the Doppler effect inherently limits detection to motion gestures; this approach would need to work in combination with other techniques to detect static poses. In the meantime, they see value in extending the gesture set and exploring possibilities with newer mobile devices that feature multiple speakers and microphones.