plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph ne...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-10-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fmicb.2023.1267695/full |
_version_ | 1797665275199356928 |
---|---|
author | Janik Sielemann Katharina Sielemann Broňa Brejová Tomáš Vinař Cedric Chauve |
author_facet | Janik Sielemann Katharina Sielemann Broňa Brejová Tomáš Vinař Cedric Chauve |
author_sort | Janik Sielemann |
collection | DOAJ |
description | Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets. |
first_indexed | 2024-03-11T19:41:37Z |
format | Article |
id | doaj.art-8eb508d1a23c4ed18ec5b9795241fa83 |
institution | Directory Open Access Journal |
issn | 1664-302X |
language | English |
last_indexed | 2024-03-11T19:41:37Z |
publishDate | 2023-10-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Microbiology |
spelling | doaj.art-8eb508d1a23c4ed18ec5b9795241fa832023-10-06T09:50:58ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2023-10-011410.3389/fmicb.2023.12676951267695plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graphJanik Sielemann0Katharina Sielemann1Broňa Brejová2Tomáš Vinař3Cedric Chauve4Computational Biology, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, GermanyGenetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, GermanyDepartment of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, SlovakiaDepartment of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, SlovakiaDepartment of Mathematics, Simon Fraser University, Burnaby, BC, CanadaIdentification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.https://www.frontiersin.org/articles/10.3389/fmicb.2023.1267695/fullbioinformaticsmachine learning (ML)classificationplasmidsassembly graph |
spellingShingle | Janik Sielemann Katharina Sielemann Broňa Brejová Tomáš Vinař Cedric Chauve plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph Frontiers in Microbiology bioinformatics machine learning (ML) classification plasmids assembly graph |
title | plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph |
title_full | plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph |
title_fullStr | plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph |
title_full_unstemmed | plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph |
title_short | plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph |
title_sort | plasgraph2 using graph neural networks to detect plasmid contigs from an assembly graph |
topic | bioinformatics machine learning (ML) classification plasmids assembly graph |
url | https://www.frontiersin.org/articles/10.3389/fmicb.2023.1267695/full |
work_keys_str_mv | AT janiksielemann plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph AT katharinasielemann plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph AT bronabrejova plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph AT tomasvinar plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph AT cedricchauve plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph |