Manipulating Space and Time in Mixed Reality

Published

Virtual Reality and Augmented Reality each has distinct advantages when it comes to bringing digital capabilities into our everyday lives. Everything users see, including the environment and every object can be controlled and changed. This also means that what users see doesn’t necessarily mirror the real world. If another user should enter the room, for example, she would not be represented in VR. AR, on the other hand, lets users enrich their physical environment with virtual information. You can add virtual objects to an environment as well as visually alter or remove real-world objects. Many of those changes, however, are challenging to achieve. To remove a physical object, say to simulate remodeling of a room, any technology must be able to overlay the existing physical object with virtual content so that the object is no longer visible. This requires precise alignment of virtual and physical objects, the ability to extract background information from behind the physical object and a display technology that is capable of overlaying the physical object with the extracted background information in a way that the original object is no longer visible. Since typical mixed reality technologies operate on front-facing cameras and a head-mounted see-through display or video see-through displays, these changes can be challenging to achieve, for example because the front-facing camera can’t see the environment behind the physical object that should be removed.

Figure 1. Manipulating space and time in mixed reality. Users see their environment as 3D reconstruction with a head-mounted VR display. They can perform changes such as removing objects in a room or teleport themselves to a different location in the room.

We set out to combine the benefits of VR and AR by creatively manipulating space and time in mixed reality. We created a mixed reality environment where users see a live representation of their environment. In contrast to other approaches, users do not see the image from a head-mounted camera. Instead, we equipped the room with eight ceiling-mounted Kinect cameras (see Figure 2.) The data from the cameras is combined into a single full-room 3D reconstruction. Users wear a VR display and see a reconstructed live view of the environment. We match the user’s virtual viewpoint to their physical position in the room. This means that physical objects and their virtual representations are aligned. Since the data is live, any changes in the physical environment (for example, furniture moving or people entering the room) are visible in real time to users. They can still touch and feel the real world while gaining the ability to perform nearly arbitrary visual changes.

Microsoft Research Blog

Microsoft Research Forum Episode 3: Globally inclusive and equitable AI, new use cases for AI, and more

In the latest episode of Microsoft Research Forum, researchers explored the importance of globally inclusive and equitable AI, shared updates on AutoGen and MatterGen, presented novel use cases for AI, including industrial applications and the potential of multimodal models to improve assistive technologies.

Figure 2. Users see the environment through a 3D reconstruction captured with eight ceiling-mounted Kinect cameras.

We see these types of space and time manipulations as a new type of see-through AR that lets users experience their physical environment while also enabling changes that typically are only possible in VR. Users easily can “erase” geometry in a room ( Figure 1c), enabling applications such as simulated room remodeling. Since the Kinect cameras cover the environment from multiple angles, users can erase parts of the room and see the environment behind these parts. This is challenging to achieve with conventional AR that relies on a front-facing camera because the environment behind objects is occluded.

These space and time manipulations also make possible recoloring, moving or copying objects. An example of “cloning” an object is shown in Figure 3. Since all changes are applied continuously on the live geometry, any moved or copied object continues to update dynamically in real time after the operation is invoked.

Figure 3. The chair is cloned (copied and moved by the user). While only one chair exists in the physical world (left), the user sees two exact copies (right). This is possible because both virtual chairs consist of the same 3D geometry.

Temporal changes are also possible. Users can halt time (that is, suspend receiving live data) and move freely in their static reconstructed environment. They also can record events (for example, meetings) and play them back at any desired speed. The user in Figure 3 (left) paused time while jumping. This effectively presents him with a static 3D reconstruction, meaning he can still move around freely. And because the environment is a 3D model, changing the user’s view is as simple as modifying the graphics camera eyepoint, enabling virtual motion (in VR referred to as teleportation). For example, a user can move to another part of the room virtually to inspect the environment from a different perspective without changing their physical location (examples of this are shown in Figure 1d and Figure 5.) Similarly, we implemented a view portal mechanism with which users can see observe parts of the room (for example, the space behind them) simultaneously.

Figure 4. A user paused time (left) and walked around the table to see himself jumping. Right shows a user sitting a table with two view portals (one to his left, one behind him). By looking through the left portal, he can see himself from behind.

Because these experiments combine data from multiple Kinect cameras, we also implemented the ability to include a remote location within the user’s current space. This interaction is inspired by earlier work such as Holoportation. Figure 5 shows a remote user and his environment rendered in the user’s current view, in this case on the desk as miniature. The remote user simply needs a Kinect; the implementation combines the local and remote data.

Figure 5. Showing remote content – in this case a user sitting at a remote desk with a ceiling-mounted Kinect camera. Right shows a remodeled room. The desk and the couch are virtual objects, while the table in the middle has been erased. This configuration features teleportation to arbitrary viewpoints. In this case, the user teleported to one top corner of the room to get a bird’s eye view on the scene.

We envision interesting scenarios for these types of space and time manipulations in mixed reality, everything from room remodeling, gaming to meetings (live and playback). We can extend the space of possibilities in mixed reality and provides a way to seamlessly bridge the virtual and physical world. It gives users full control over how they perceive their physical environment, letting them make nearly arbitrary modifications, including those that would not be plausible in the real world. We envision future versions of mixed reality to be capable of altering users’ perception of physics (for example, creating a real-time zero-gravity environment) and even perception of time (for example, a slow-motion environment). Once users can no longer distinguish what is virtual and what is real, we can seek the most comfortable or appropriate level of mediation. We envision such a system to be suitable for a wide range of psychophysical experiments (for example, investigating out-of-body perception), or to explore how it would feel to live in an upside-down world, or have the ability to zoom or see through walls. This work represents initial exploration in this field. There remains much to do, including improving the visual quality of what users see, and we are eager to explore the benefits, and limits of our approach. These concepts represent early steps in gaining fuller control over what is possible regarding what users see in mixed reality and being able to choose the level of augmentation we desire.

Continue reading

See all blog posts