Data storage has always been a key tenet of compute, and with the massive growth in cloud compute, the demand for cloud data storage has opened an avenue for both revisiting prior technologies and developing new ones. It is projected that around 125 zettabytes of data will be generated annually by 2024, and storing this in a cost-effective way is going to be a big challenge.
The cloud has also changed the way we think about compute and storage. In the cloud, services are virtualized. In cloud data storage, for example, customers pay for storage capacity and access rate rather than for physical storage devices (see Figure 1). This virtualization provides new opportunities to design and optimize technologies that are uniquely adapted for the cloud. This is particularly interesting in storage since all current storage mediums were created during the pre-cloud era. In cloud storage, this provides opportunities for new storage devices with different features to both complement the existing storage technologies that we deploy today and to solve some of the challenges that the cloud has placed on storage.
Microsoft Research is taking on these challenges head on in their Optics for the Cloud (opens in new tab) program, where researchers are investigating new ways to improve storage, compute, and networking by bringing together different areas of technical expertise to identify new applications for optics (opens in new tab). We see a real opportunity to make an impact by bringing together optical physicists and engineers, with the expertise that Microsoft has in computer systems and artificial intelligence (AI). AI is one of these intersecting areas that offers great potential in this space as it continues to make rapid advances in deep learning and beyond. In optical storage specifically, researchers are especially interested in opportunities to meet the current and future storage demands of the cloud.
At Microsoft Research Cambridge, in collaboration with Microsoft Azure, we’ve been investigating new cloud-first optical storage technologies. For several years now in Project Silica, we’ve been developing an optical storage technology that uses glass storage media. In this technology we exploit the longevity of glass media for write once read many (WORM) archival storage. In a recent Hot Storage paper, we also talked about the challenges that the incumbent storage technologies are facing in the cloud era and the opportunity that this brings for new storage technologies—two particular challenges are increasing both storage density and access rates. In this blog post, we are introducing Project HSD (Holographic Storage Device), a new project that is reimagining how holographic storage can be utilized in the cloud era.
Spotlight: On-demand video
We have found that revisiting this technology now is especially advantageous—in a time where the cloud is growing its reach, commodity optics-related components have made large steps forward, and new machine learning techniques can be integrated into the process. In our work so far, we have already achieved 1.8x higher density than the state of the art for volumetric holographic storage, and we are working on increasing density and access rates further.
How does holographic storage work?
Holographic storage uses light to record data pages, each holding hundreds of kilobytes of data as a tiny hologram inside a crystal. The hologram occupies a small volume inside the crystal, which we think of as a zone, and multiple pages can be recorded in the same physical volume or zone. The data pages are read back out by diffracting a pulse of light off the recorded hologram and capturing this on a camera. This reconstructs the original data page. The recorded holograms can be erased with UV light, and then the media can be reused to store more holograms—making this a rewritable storage media.
In contrast, the glass used as a storage medium in Project Silica is meant for long-term archival storage (opens in new tab) due to its longevity and write-once nature. Holographic storage is a good candidate for warm, read/write cloud storage because it is rewritable and has the potential for fast access rates.
The idea of holographic data storage dates back to the 1960s. By the early 2000s, several research groups—both in academia and industry—had made significant advances in demonstrating impressive storage densities that could be achieved with holographic storage media.
So why revisit holographic storage as a solution for the cloud?
With today’s storage solutions, access rates are a pain point. Flash storage provides high access rates but is relatively expensive, and many cloud applications keep data on hard disk drives (HDDs). This data is inherently slower to access due to the mechanical nature of HDDs. The inherent parallelism of optics—the ability to write and read multiple bits in parallel—has always been one of the most attractive features of holographic storage.
This parallelism has the potential to provide high data throughput. In addition, seeking or addressing different pages only requires the steering of optical beams rather than the movement of a large mechanical system. This can be achieved at much higher rates than in existing storage devices (such as HDDs) by using electronic devices. In this case, holographic storage has the potential to have much lower seek latencies and, as a result, provide increased access rates at cost-effective capacities. This feature is particularly attractive for the many cloud applications that need high access rates and low tail latency when accessing storage.
Designing the storage hardware from the ground up for the cloud also frees us from the constraints of consumer devices, for example the need to fit a 2.5-inch or 3.5-inch hard disk form factor. The smallest unit of deployment in cloud storage is the storage rack, which allows us to design the new hardware at “rack scale,” allowing components to be efficiently shared across the entire rack.
Bringing holographic storage to the present with commodity hardware and deep learning
Our team has focused on simultaneously achieving both density and fast access rates. We have deployed recently developed high-power fiber laser systems to reduce the write and read times by over an order of magnitude to support high access rates. We have also exploited the recent developments in high resolution LCOS spatial light modulators and cameras, driven by the display industry and the smartphone industry, respectively, to increase the density. In particular, the high-resolution camera technology is key as this allows us to move complexity from the optical hardware to software.
In the previous state of the art, it was necessary to use complex optics to achieve one-to-one pixel matching from the display device to the camera to maximize the density. Today, we can leverage commodity high-resolution cameras (shown in Figure 3) and modern deep learning techniques to shift the complexity into the digital domain. This lets us use simpler, cheaper optics without pixel matching and compensate for the resulting optical distortions with commodity hardware and software. This approach also reduces the manufacturing tolerances as the system can be compensated and calibrated at runtime in software. Using this combination of high-resolution commodity components and deep learning has already enabled us to increase the storage density by 1.8x over the state of the art.
Looking forward: Scaling up optical storage solutions for the cloud
While we have seen compelling performance in terms of write/read time and storage density that we can obtain in a single zone, the challenge to make holographic storage practical for the cloud is to develop approaches that allow for scaling of the storage capacity by increasing the number of zones while maintaining the same access rates across multiple zones. Previous approaches in this space simply mechanically moved the media, but this is much too slow.
To address this issue, we are currently working on demonstrating a multi-zone approach without mechanical movement that maintains the access rate. The future goal for holographic storage in Project HSD is to create a technology that is uniquely tailored to the cloud, one with both fast access rates and a storage density that greatly surpasses its predecessors. To learn more about Project HSD and Optics for the Cloud research, check out Azure CTO Mark Russinovich’s segment at Microsoft Ignite 2020 (opens in new tab) and our project page for more details.