A Content-Addressable DNA Database with Learned Sequence Encodings

24th International Conference On DNA Computing and Molecular Programming |

Published by Springer-Verlag

PDF

We present strand and codeword design schemes for a DNA database capable of approximate similarity search over a multidimensional dataset of content-rich media. Our strand designs address cross-talk in associative DNA databases, and we demonstrate a novel method for learning DNA sequence encodings from data, applying it to a dataset of tens of thousands of images. We test our design in the wetlab using one hundred target images and ten query images, and show that our database is capable of performing similarity-based enrichment: on average, visually similar images account for 30% of the sequencing reads for each query, despite making up only 10% of the database.

Ultra-dense data storage and extreme parallelism with electronic-molecular systems

Sustaining growth in storage and computational needs is increasingly challenging. For over a decade, exponentially more information has been produced year after year while data storage solutions are pressed to keep up. Soon, current solutions will be unable to match new information in need of storage. Computing is on a similar trajectory, with new needs emerging in search and other domains that require more efficient systems. Innovative methods are necessary to ensure the ability to address future demands, and DNA provides an opportunity at the molecular level for ultra-dense, durable, and sustainable solutions in these areas. In this webinar, join Microsoft researcher Karin Strauss in exploring the role of biotechnology and synthetic DNA in reaching this goal. Although we have yet to achieve scalable, general-purpose…