GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species
Abstract Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parall...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-04-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-023-02906-z |
_version_ | 1797841037442416640 |
---|---|
author | Liubin Zhang Yangyang Yuan Wenjie Peng Bin Tang Mulin Jun Li Hongsheng Gui Qiang Wang Miaoxin Li |
author_facet | Liubin Zhang Yangyang Yuan Wenjie Peng Bin Tang Mulin Jun Li Hongsheng Gui Qiang Wang Miaoxin Li |
author_sort | Liubin Zhang |
collection | DOAJ |
description | Abstract Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research. |
first_indexed | 2024-04-09T16:24:26Z |
format | Article |
id | doaj.art-d4f6f3be3aab4bcba4e0de1f78462441 |
institution | Directory Open Access Journal |
issn | 1474-760X |
language | English |
last_indexed | 2024-04-09T16:24:26Z |
publishDate | 2023-04-01 |
publisher | BMC |
record_format | Article |
series | Genome Biology |
spelling | doaj.art-d4f6f3be3aab4bcba4e0de1f784624412023-04-23T11:18:56ZengBMCGenome Biology1474-760X2023-04-0124112210.1186/s13059-023-02906-zGBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of speciesLiubin Zhang0Yangyang Yuan1Wenjie Peng2Bin Tang3Mulin Jun Li4Hongsheng Gui5Qiang Wang6Miaoxin Li7Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen UniversityProgram in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen UniversityProgram in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen UniversityProgram in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen UniversityThe Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical UniversityBehavioral Health Services, Henry Ford HealthMental Health Center, West China Hospital, Sichuan UniversityProgram in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen UniversityAbstract Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.https://doi.org/10.1186/s13059-023-02906-zLarge-scale genotypesGenotype compressionHighly addressable genotype blocksByte-encoding genotypesGenotype managementParallelization algorithm |
spellingShingle | Liubin Zhang Yangyang Yuan Wenjie Peng Bin Tang Mulin Jun Li Hongsheng Gui Qiang Wang Miaoxin Li GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species Genome Biology Large-scale genotypes Genotype compression Highly addressable genotype blocks Byte-encoding genotypes Genotype management Parallelization algorithm |
title | GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species |
title_full | GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species |
title_fullStr | GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species |
title_full_unstemmed | GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species |
title_short | GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species |
title_sort | gbc a parallel toolkit based on highly addressable byte encoding blocks for extremely large scale genotypes of species |
topic | Large-scale genotypes Genotype compression Highly addressable genotype blocks Byte-encoding genotypes Genotype management Parallelization algorithm |
url | https://doi.org/10.1186/s13059-023-02906-z |
work_keys_str_mv | AT liubinzhang gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT yangyangyuan gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT wenjiepeng gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT bintang gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT mulinjunli gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT hongshenggui gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT qiangwang gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies AT miaoxinli gbcaparalleltoolkitbasedonhighlyaddressablebyteencodingblocksforextremelylargescalegenotypesofspecies |