Project PRISM

Early in 2016 our team began developing augmented reality (AR) headsets by adding stereo cameras to a virtual reality (VR) system. The user sees the real world through the video feed from the cameras. Synthetic imagery is composited with the video to create a low-cost AR system.

This form of AR has several advantages over optical see through AR:

1. Low cost camera modules add roughly $40 parts cost to a VR headset

2. Wide field of view, achieving more than 100+ degrees horizontally and vertically

3. Works in full sunlight, with truly opaque synthetic imagery. This improves color rendition and contrast of synthetic imagery even indoors.

Our first prototypes used off-the-shelf machine vision cameras attached to a commercial VR headset with a display resolution of approximately 1M pixel/eye at 60Hz or 90Hz (Figs. 1,2). Two USB 3.0 cables transmit the camera images to the host CPU.

We wrote our own real-time image signal processing pipeline to demosaic the Bayer image, correct for camera lens distortion, and correct color. Photon-to-photon latency is approximately 50ms. This delay can be corrected surprisingly well with a late-stage reprojection homography.

Photon‑to‑photon latency is approximately 50 ms

Figure 1 Figure 2

The cameras were synchronized to the displays by a custom sync board (Fig. 3) , designed and built in the Microsoft Research (MSR) Labs hardware lab. It was triggered by the vertical refresh signal from the display circuit on the head-mounted display (HMD). This signal is not available on the exterior of any of the commercial headsets we used, so we opened the HMD case and probed pins on the display circuit board to find one that had the correct display update frequency.

Figure 3

These early prototypes were promising enough to encourage us to make a second generation headset, which we began in the summer of 2017. The biggest weaknesses of the first generation were the bulk and weight of the large form factor-machine vision cameras, and the low resolution of the displays. Both have been addressed in the second generation.

The new headsets have a custom camera control circuit board designed and built by the MSR Labs hardware lab. This uses a rolling-shutter image sensor, the OmniVision OV4689, which can capture 4M pixels at 90Hz, within a cell phone form factor. The cameras are again synchronized to the VR headset displays by tapping the sync signal from the display controller board, but the sync circuitry is contained completely in the camera controller board.

The new camera module is much smaller and lighter than the smallest off-the-shelf machine vision system of equivalent resolution and frame rate (Figs. 4,5). Image quality is also surprisingly good given the small 2-micron-square pixels.

Figure 4 Figure 5

The sensor video is transferred to the host computer over USB 3.0 cables via a Cypress CX3 MIPI to USB bridge chip. Bandwidth limitations in this chip limited 90Hz capture to 1920×1440, not quite the full 2688×1520 resolution the sensor is capable of.

The MSR Labs Central Engineering team wrote custom camera-controller firmware so the camera frame rate, exposure, and other sensor parameters can be set in software on the host PC. They also wrote custom USB drivers to handle the high-data-rate video coming over the USB 3.0 cables. Photon-to-photon latency is approximately 50 ms, again corrected with a late-stage reprojection homography.

headset also did away with the mobility limitations of our previous prototypes

Figure 6 Figure 7

The higher resolution significantly improves the view of the real world and the lightweight camera modules make it far more comfortable than the first-generation prototypes. In addition, the inside out tracking of the Mixed Reality headset also did away with the mobility limitations of our previous prototypes. This system uses a backpack computer and is fully mobile (Figs. 6,7).

Approximately 40 of these headsets were produced, which are being used for research purposes in various groups inside Microsoft. Calibrating this many headsets manually is unreasonably time consuming so we created a robotic camera calibration system to do it automatically (Fig. 8).

Calibrating this many headsets manually is unreasonably time consuming so we created a robotic camera calibration system to do it automatically (Fig. 8).

Figure 8

We are actively developing future prototypes with latency in the 5-10 ms range. Higher resolution prototypes will be made as the resolution of off-the shelf HMDs increases.