ComHapDet: a spatial community detection algorithm for haplotype assembly

Abstract Background Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing ha...

Full description

Bibliographic Details
Main Authors: Abishek Sankararaman, Haris Vikalo, François Baccelli
Format: Article
Language:English
Published: BMC 2020-09-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-020-06935-x
_version_ 1818208264862564352
author Abishek Sankararaman
Haris Vikalo
François Baccelli
author_facet Abishek Sankararaman
Haris Vikalo
François Baccelli
author_sort Abishek Sankararaman
collection DOAJ
description Abstract Background Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. Results We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet – a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Conclusions Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.
first_indexed 2024-12-12T04:42:04Z
format Article
id doaj.art-1096ed6c0a0a432db67b8c1e51efd7fd
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-12T04:42:04Z
publishDate 2020-09-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-1096ed6c0a0a432db67b8c1e51efd7fd2022-12-22T00:37:45ZengBMCBMC Genomics1471-21642020-09-0121S911410.1186/s12864-020-06935-xComHapDet: a spatial community detection algorithm for haplotype assemblyAbishek Sankararaman0Haris Vikalo1François Baccelli2Department of Electrical and Computer Engineering, The University of Texas at AustinDepartment of Electrical and Computer Engineering, The University of Texas at AustinDepartment of Electrical and Computer Engineering, The University of Texas at AustinAbstract Background Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. Results We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet – a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Conclusions Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.http://link.springer.com/article/10.1186/s12864-020-06935-xHaplotype assemblySpatial random graphGraph clustering
spellingShingle Abishek Sankararaman
Haris Vikalo
François Baccelli
ComHapDet: a spatial community detection algorithm for haplotype assembly
BMC Genomics
Haplotype assembly
Spatial random graph
Graph clustering
title ComHapDet: a spatial community detection algorithm for haplotype assembly
title_full ComHapDet: a spatial community detection algorithm for haplotype assembly
title_fullStr ComHapDet: a spatial community detection algorithm for haplotype assembly
title_full_unstemmed ComHapDet: a spatial community detection algorithm for haplotype assembly
title_short ComHapDet: a spatial community detection algorithm for haplotype assembly
title_sort comhapdet a spatial community detection algorithm for haplotype assembly
topic Haplotype assembly
Spatial random graph
Graph clustering
url http://link.springer.com/article/10.1186/s12864-020-06935-x
work_keys_str_mv AT abisheksankararaman comhapdetaspatialcommunitydetectionalgorithmforhaplotypeassembly
AT harisvikalo comhapdetaspatialcommunitydetectionalgorithmforhaplotypeassembly
AT francoisbaccelli comhapdetaspatialcommunitydetectionalgorithmforhaplotypeassembly