PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the ap...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-09-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2020.516269/full |
_version_ | 1819108668104245248 |
---|---|
author | Haiying Xie Haiying Xie Caiyun Yang Yamin Sun Yasuo Igarashi Tao Jin Feng Luo Feng Luo |
author_facet | Haiying Xie Haiying Xie Caiyun Yang Yamin Sun Yasuo Igarashi Tao Jin Feng Luo Feng Luo |
author_sort | Haiying Xie |
collection | DOAJ |
description | PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples. |
first_indexed | 2024-12-22T03:13:35Z |
format | Article |
id | doaj.art-718edcd2b3104f08aa7a0fe7c444c196 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-22T03:13:35Z |
publishDate | 2020-09-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-718edcd2b3104f08aa7a0fe7c444c1962022-12-21T18:40:52ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-09-011110.3389/fgene.2020.516269516269PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome BinningHaiying Xie0Haiying Xie1Caiyun Yang2Yamin Sun3Yasuo Igarashi4Tao Jin5Feng Luo6Feng Luo7Research Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaPUROTON Gene Medical Institute Co., Ltd., Chongqing, ChinaResearch Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaResearch Center for Functional Genomics and Biochip, Tianjin Biochip Co., Ltd., Tianjin, ChinaResearch Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaThe Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, ChinaResearch Center of Bioenergy and Bioremediation, College of Resources and Environment, Southwest University, Chongqing, ChinaPUROTON Gene Medical Institute Co., Ltd., Chongqing, ChinaPacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples.https://www.frontiersin.org/article/10.3389/fgene.2020.516269/fullhybrid assemblyPacBiogene cataloganaerobic digestiongenome reconstruction |
spellingShingle | Haiying Xie Haiying Xie Caiyun Yang Yamin Sun Yasuo Igarashi Tao Jin Feng Luo Feng Luo PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning Frontiers in Genetics hybrid assembly PacBio gene catalog anaerobic digestion genome reconstruction |
title | PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning |
title_full | PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning |
title_fullStr | PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning |
title_full_unstemmed | PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning |
title_short | PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning |
title_sort | pacbio long reads improve metagenomic assemblies gene catalogs and genome binning |
topic | hybrid assembly PacBio gene catalog anaerobic digestion genome reconstruction |
url | https://www.frontiersin.org/article/10.3389/fgene.2020.516269/full |
work_keys_str_mv | AT haiyingxie pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT haiyingxie pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT caiyunyang pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT yaminsun pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT yasuoigarashi pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT taojin pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT fengluo pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning AT fengluo pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning |