Joint base-calling of Two DNA Sequences with Factor Graphs

Automated estimation of DNA base-sequences is an important step in genomics and in many other emerging fields in biological and medical sciences. Current automated sequencers process single strands only. To improve the utility of existing technologies, we propose to mix two independent strands prior...

Full description

Bibliographic Details
Main Authors: Shi, Xiaomeng, Lun, Desmond S., Medard, Muriel, Koetter, Ralf, Meldrim, James C., Barry, Andrew James
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:en_US
Published: Institute of Electrical and Electronics Engineers 2011
Online Access:http://hdl.handle.net/1721.1/62009
https://orcid.org/0000-0003-4059-407X
Description
Summary:Automated estimation of DNA base-sequences is an important step in genomics and in many other emerging fields in biological and medical sciences. Current automated sequencers process single strands only. To improve the utility of existing technologies, we propose to mix two independent strands prior to electrophoresis, and base-call jointly by applying the sum-product algorithm on factor graphs. We first present a statistical model for DNA sequencing data and examine the model parameters. A practical heuristic is then proposed to estimate the peaks, which are then separated into two source sequences (Major/Minor) by passing messages on a factor graph. Simulation results show that joint base-calling can provide less accurate but valid results for the minor. The algorithm presented provides a basis for future investigation of joint sequencing techniques.