HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studi...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Public Library of Science
2014
|
Online Access: | http://hdl.handle.net/1721.1/86088 https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 |
_version_ | 1826203551075926016 |
---|---|
author | Berger, Emily Yorukoglu, Deniz Peng, Jian Berger, Bonnie |
author2 | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
author_facet | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Berger, Emily Yorukoglu, Deniz Peng, Jian Berger, Bonnie |
author_sort | Berger, Emily |
collection | MIT |
description | As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. |
first_indexed | 2024-09-23T12:38:48Z |
format | Article |
id | mit-1721.1/86088 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T12:38:48Z |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | dspace |
spelling | mit-1721.1/860882022-10-01T10:16:05Z HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data Berger, Emily Yorukoglu, Deniz Peng, Jian Berger, Bonnie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Department of Mathematics Berger, Bonnie Yorukoglu, Deniz Peng, Jian Berger, Emily As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. National Science Foundation (U.S.) (NSF/NIH BIGDATA Grant R01GM108348-01) National Science Foundation (U.S.) (Graduate Research Fellowship) Simons Foundation 2014-04-09T20:14:28Z 2014-04-09T20:14:28Z 2014-03 2013-10 Article http://purl.org/eprint/type/JournalArticle 1553-7358 http://hdl.handle.net/1721.1/86088 Berger, Emily, Deniz Yorukoglu, Jian Peng, and Bonnie Berger. “HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data.” Edited by Isidore Rigoutsos. PLoS Comput Biol 10, no. 3 (March 27, 2014): e1003502. https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 en_US http://dx.doi.org/10.1371/journal.pcbi.1003502 PLoS Computational Biology Creative Commons Attribution http://creativecommons.org/licenses/by/4.0/ application/pdf Public Library of Science PLoS |
spellingShingle | Berger, Emily Yorukoglu, Deniz Peng, Jian Berger, Bonnie HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data |
title | HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data |
title_full | HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data |
title_fullStr | HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data |
title_full_unstemmed | HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data |
title_short | HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data |
title_sort | haptree a novel bayesian framework for single individual polyplotyping using ngs data |
url | http://hdl.handle.net/1721.1/86088 https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 |
work_keys_str_mv | AT bergeremily haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata AT yorukogludeniz haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata AT pengjian haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata AT bergerbonnie haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata |