HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data

As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studi...

Full description

Bibliographic Details
Main Authors: Berger, Emily R, Yorukoglu, Deniz, Peng, Jian, Berger Leighton, Bonnie
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Published: Public Library of Science (PLoS) 2018
Online Access:http://hdl.handle.net/1721.1/116304
https://orcid.org/0000-0003-2315-0768
https://orcid.org/0000-0002-2724-7228
_version_ 1826215220958199808
author Berger, Emily R
Yorukoglu, Deniz
Peng, Jian
Berger Leighton, Bonnie
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Berger, Emily R
Yorukoglu, Deniz
Peng, Jian
Berger Leighton, Bonnie
author_sort Berger, Emily R
collection MIT
description As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.
first_indexed 2024-09-23T16:19:54Z
format Article
id mit-1721.1/116304
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T16:19:54Z
publishDate 2018
publisher Public Library of Science (PLoS)
record_format dspace
spelling mit-1721.1/1163042022-10-02T07:44:06Z HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data Berger, Emily R Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Mathematics Berger, Emily R Yorukoglu, Deniz Peng, Jian Berger Leighton, Bonnie As the more recent next-generation sequencing (NGS) technologies provide longer read sequences, the use of sequencing datasets for complete haplotype phasing is fast becoming a reality, allowing haplotype reconstruction of a single sequenced genome. Nearly all previous haplotype reconstruction studies have focused on diploid genomes and are rarely scalable to genomes with higher ploidy. Yet computational investigations into polyploid genomes carry great importance, impacting plant, yeast and fish genomics, as well as the studies of the evolution of modern-day eukaryotes and (epi)genetic interactions between copies of genes. In this paper, we describe a novel maximum-likelihood estimation framework, HapTree, for polyploid haplotype assembly of an individual genome using NGS read datasets. We evaluate the performance of HapTree on simulated polyploid sequencing read data modeled after Illumina sequencing technologies. For triploid and higher ploidy genomes, we demonstrate that HapTree substantially improves haplotype assembly accuracy and efficiency over the state-of-the-art; moreover, HapTree is the first scalable polyplotyping method for higher ploidy. As a proof of concept, we also test our method on real sequencing data from NA12878 (1000 Genomes Project) and evaluate the quality of assembled haplotypes with respect to trio-based diplotype annotation as the ground truth. The results indicate that HapTree significantly improves the switch accuracy within phased haplotype blocks as compared to existing haplotype assembly methods, while producing comparable minimum error correction (MEC) values. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. 2018-06-14T13:43:16Z 2018-06-14T13:43:16Z 2014-03 2013-10 2018-05-16T17:28:38Z Article http://purl.org/eprint/type/JournalArticle 1553-7358 http://hdl.handle.net/1721.1/116304 Berger, Emily, et al. “HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data.” PLoS Computational Biology, edited by Isidore Rigoutsos, vol. 10, no. 3, Mar. 2014, p. e1003502. © 2014 Berger et al. https://orcid.org/0000-0003-2315-0768 https://orcid.org/0000-0002-2724-7228 http://dx.doi.org/10.1371/JOURNAL.PCBI.1003502 PLoS Computational Biology Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/ application/pdf Public Library of Science (PLoS) PLoS
spellingShingle Berger, Emily R
Yorukoglu, Deniz
Peng, Jian
Berger Leighton, Bonnie
HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
title HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
title_full HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
title_fullStr HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
title_full_unstemmed HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
title_short HapTree: A Novel Bayesian Framework for Single Individual Polyplotyping Using NGS Data
title_sort haptree a novel bayesian framework for single individual polyplotyping using ngs data
url http://hdl.handle.net/1721.1/116304
https://orcid.org/0000-0003-2315-0768
https://orcid.org/0000-0002-2724-7228
work_keys_str_mv AT bergeremilyr haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata
AT yorukogludeniz haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata
AT pengjian haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata
AT bergerleightonbonnie haptreeanovelbayesianframeworkforsingleindividualpolyplotypingusingngsdata