Effective binning of metagenomic contigs using contrastive multi-view representation learning

Abstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heter...

Full description

Bibliographic Details
Main Authors: Ziye Wang, Ronghui You, Haitao Han, Wei Liu, Fengzhu Sun, Shanfeng Zhu
Format: Article
Language:English
Published: Nature Portfolio 2024-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-44290-z
_version_ 1797349823633948672
author Ziye Wang
Ronghui You
Haitao Han
Wei Liu
Fengzhu Sun
Shanfeng Zhu
author_facet Ziye Wang
Ronghui You
Haitao Han
Wei Liu
Fengzhu Sun
Shanfeng Zhu
author_sort Ziye Wang
collection DOAJ
description Abstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).
first_indexed 2024-03-08T12:36:59Z
format Article
id doaj.art-fcc9b203a74f44adbdec007db0e67bca
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-08T12:36:59Z
publishDate 2024-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-fcc9b203a74f44adbdec007db0e67bca2024-01-21T12:27:29ZengNature PortfolioNature Communications2041-17232024-01-0115111410.1038/s41467-023-44290-zEffective binning of metagenomic contigs using contrastive multi-view representation learningZiye Wang0Ronghui You1Haitao Han2Wei Liu3Fengzhu Sun4Shanfeng Zhu5Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityDepartment of Quantitative and Computational Biology, University of Southern CaliforniaInstitute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan UniversityAbstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing binning methods face challenges in practical applications due to the diversity of data types and the difficulties in efficiently integrating heterogeneous information. Here, we introduce COMEBin, a binning method based on contrastive multi-view representation learning. COMEBin utilizes data augmentation to generate multiple fragments (views) of each contig and obtains high-quality embeddings of heterogeneous features (sequence coverage and k-mer distribution) through contrastive learning. Experimental results on multiple simulated and real datasets demonstrate that COMEBin outperforms state-of-the-art binning methods, particularly in recovering near-complete genomes from real environmental samples. COMEBin outperforms other binning methods remarkably when integrated into metagenomic analysis pipelines, including the recovery of potentially pathogenic antibiotic-resistant bacteria (PARB) and moderate or higher quality bins containing potential biosynthetic gene clusters (BGCs).https://doi.org/10.1038/s41467-023-44290-z
spellingShingle Ziye Wang
Ronghui You
Haitao Han
Wei Liu
Fengzhu Sun
Shanfeng Zhu
Effective binning of metagenomic contigs using contrastive multi-view representation learning
Nature Communications
title Effective binning of metagenomic contigs using contrastive multi-view representation learning
title_full Effective binning of metagenomic contigs using contrastive multi-view representation learning
title_fullStr Effective binning of metagenomic contigs using contrastive multi-view representation learning
title_full_unstemmed Effective binning of metagenomic contigs using contrastive multi-view representation learning
title_short Effective binning of metagenomic contigs using contrastive multi-view representation learning
title_sort effective binning of metagenomic contigs using contrastive multi view representation learning
url https://doi.org/10.1038/s41467-023-44290-z
work_keys_str_mv AT ziyewang effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning
AT ronghuiyou effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning
AT haitaohan effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning
AT weiliu effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning
AT fengzhusun effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning
AT shanfengzhu effectivebinningofmetagenomiccontigsusingcontrastivemultiviewrepresentationlearning