Trellis BMA: coded trace reconstruction on IDS channels for DNA storage

International Symposium on Information Theory (ISIT) |

Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further by introducing redundancy in write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an insertion-deletion-substitution (IDS) channel and design both encoding schemes and low-complexity decoding algorithms for coded trace reconstruction.

We introduce Trellis BMA, a new reconstruction algorithm whose complexity is linear in the number of traces, and compare its performance to previous algorithms. Our results show that it reduces the error rate on both simulated and experimental data. The performance comparisons in this paper are based on
the Clustered Nanopore Reads Dataset publicly released with this paper. Our hope is that this dataset will enable research progress by allowing objective comparisons between candidate algorithms.

Publication Downloads

Clustered Nanopore Reads (CNR) Dataset

May 7, 2021

DNA storage aims to store information in the form of DNA sequences. This is a research project in Microsoft Research Redmond. This repo contains a dataset of real DNA sequences which can be used for benchmarking different trace reconstruction algorithms. There is no code. We release the dataset of clustered nanopore DNA reads together with our paper: Trellis BMA: coded trace reconstruction on IDS channels for DNA storage