PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning

PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the ap...

Full description

Bibliographic Details
Main Authors: Haiying Xie, Caiyun Yang, Yamin Sun, Yasuo Igarashi, Tao Jin, Feng Luo
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-09-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.516269/full
_version_ 1819108668104245248
author Haiying Xie
Haiying Xie
Caiyun Yang
Yamin Sun
Yasuo Igarashi
Tao Jin
Feng Luo
Feng Luo
author_facet Haiying Xie
Haiying Xie
Caiyun Yang
Yamin Sun
Yasuo Igarashi
Tao Jin
Feng Luo
Feng Luo
author_sort Haiying Xie
collection DOAJ
description PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples.
first_indexed 2024-12-22T03:13:35Z
format Article
id doaj.art-718edcd2b3104f08aa7a0fe7c444c196
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-22T03:13:35Z
publishDate 2020-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-718edcd2b3104f08aa7a0fe7c444c1962022-12-21T18:40:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-09-011110.3389/fgene.2020.516269516269PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome BinningHaiying Xie0Haiying Xie1Caiyun Yang2Yamin Sun3Yasuo Igarashi4Tao Jin5Feng Luo6Feng Luo7Research Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaPUROTON Gene Medical Institute Co., Ltd., Chongqing, ChinaResearch Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaResearch Center for Functional Genomics and Biochip, Tianjin Biochip Co., Ltd., Tianjin, ChinaResearch Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaThe Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, ChinaResearch Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaPUROTON Gene Medical Institute Co., Ltd., Chongqing, ChinaPacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples.https://www.frontiersin.org/article/10.3389/fgene.2020.516269/fullhybrid assemblyPacBiogene cataloganaerobic digestiongenome reconstruction
spellingShingle Haiying Xie
Haiying Xie
Caiyun Yang
Yamin Sun
Yasuo Igarashi
Tao Jin
Feng Luo
Feng Luo
PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
Frontiers in Genetics
hybrid assembly
PacBio
gene catalog
anaerobic digestion
genome reconstruction
title PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_full PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_fullStr PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_full_unstemmed PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_short PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_sort pacbio long reads improve metagenomic assemblies gene catalogs and genome binning
topic hybrid assembly
PacBio
gene catalog
anaerobic digestion
genome reconstruction
url https://www.frontiersin.org/article/10.3389/fgene.2020.516269/full
work_keys_str_mv AT haiyingxie pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT haiyingxie pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT caiyunyang pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT yaminsun pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT yasuoigarashi pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT taojin pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT fengluo pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT fengluo pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning