Effective binning of metagenomic contigs using contrastive multi-view representation learning
Abstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heter...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-01-01
|
Series: | Nature Communications |
Online Access: | https://doi.org/10.1038/s41467-023-44290-z |
_version_ | 1797349823633948672 |
---|---|
author | Ziye Wang Ronghui You Haitao Han Wei Liu Fengzhu Sun Shanfeng Zhu |
author_facet | Ziye Wang Ronghui You Haitao Han Wei Liu Fengzhu Sun Shanfeng Zhu |
author_sort | Ziye Wang |
collection | DOAJ |
description | Abstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs). |
first_indexed | 2024-03-08T12:36:59Z |
format | Article |
id | doaj.art-fcc9b203a74f44adbdec007db0e67bca |
institution | Directory Open Access Journal |
issn | 2041-1723 |
language | English |
last_indexed | 2024-03-08T12:36:59Z |
publishDate | 2024-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Nature Communications |
spelling | doaj.art-fcc9b203a74f44adbdec007db0e67bca2024-01-21T12:27:29ZengNature PortfolioNature Communications2041-17232024-01-0115111410.1038/s41467-023-44290-zEffective binning of metagenomic contigs using contrastive multi-view representation learningZiye Wang0Ronghui You1Haitao Han2Wei Liu3Fengzhu Sun4Shanfeng Zhu5Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityDepartment of Quantitative and Computational Biology, University of Southern CaliforniaInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityAbstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).https://doi.org/10.1038/s41467-023-44290-z |
spellingShingle | Ziye Wang Ronghui You Haitao Han Wei Liu Fengzhu Sun Shanfeng Zhu Effective binning of metagenomic contigs using contrastive multi-view representation learning Nature Communications |
title | Effective binning of metagenomic contigs using contrastive multi-view representation learning |
title_full | Effective binning of metagenomic contigs using contrastive multi-view representation learning |
title_fullStr | Effective binning of metagenomic contigs using contrastive multi-view representation learning |
title_full_unstemmed | Effective binning of metagenomic contigs using contrastive multi-view representation learning |
title_short | Effective binning of metagenomic contigs using contrastive multi-view representation learning |
title_sort | effective binning of metagenomic contigs using contrastive multi view representation learning |
url | https://doi.org/10.1038/s41467-023-44290-z |
work_keys_str_mv | AT ziyewang effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning AT ronghuiyou effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning AT haitaohan effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning AT weiliu effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning AT fengzhusun effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning AT shanfengzhu effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning |