plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph

Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph ne...

Full description

Bibliographic Details
Main Authors: Janik Sielemann, Katharina Sielemann, Broňa Brejová, Tomáš Vinař, Cedric Chauve
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-10-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmicb.2023.1267695/full
_version_ 1797665275199356928
author Janik Sielemann
Katharina Sielemann
Broňa Brejová
Tomáš Vinař
Cedric Chauve
author_facet Janik Sielemann
Katharina Sielemann
Broňa Brejová
Tomáš Vinař
Cedric Chauve
author_sort Janik Sielemann
collection DOAJ
description Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.
first_indexed 2024-03-11T19:41:37Z
format Article
id doaj.art-8eb508d1a23c4ed18ec5b9795241fa83
institution Directory Open Access Journal
issn 1664-302X
language English
last_indexed 2024-03-11T19:41:37Z
publishDate 2023-10-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Microbiology
spelling doaj.art-8eb508d1a23c4ed18ec5b9795241fa832023-10-06T09:50:58ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2023-10-011410.3389/fmicb.2023.12676951267695plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graphJanik Sielemann0Katharina Sielemann1Broňa Brejová2Tomáš Vinař3Cedric Chauve4Computational Biology, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, GermanyGenetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology & Graduate School Digital Infrastructures for the Life Sciences (DILS), Bielefeld Institute for Bioinformatics Infrastructure, Bielefeld University, Bielefeld, GermanyDepartment of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, SlovakiaDepartment of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, SlovakiaDepartment of Mathematics, Simon Fraser University, Burnaby, BC, CanadaIdentification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. We provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. We employ graph neural networks (GNNs) and the assembly graph to propagate the information from nearby nodes, which leads to more accurate classification, especially for short contigs that are difficult to classify based on sequence features or database searches alone. We trained plASgraph2 on a data set of samples from the ESKAPEE group of pathogens. plASgraph2 either outperforms or performs on par with a wide range of state-of-the-art methods on testing sets of independent ESKAPEE samples and samples from related pathogens. On one hand, our study provides a new accurate and easy to use tool for contig classification in bacterial isolates; on the other hand, it serves as a proof-of-concept for the use of GNNs in genomics. Our software is available at https://github.com/cchauve/plasgraph2 and the training and testing data sets are available at https://github.com/fmfi-compbio/plasgraph2-datasets.https://www.frontiersin.org/articles/10.3389/fmicb.2023.1267695/fullbioinformaticsmachine learning (ML)classificationplasmidsassembly graph
spellingShingle Janik Sielemann
Katharina Sielemann
Broňa Brejová
Tomáš Vinař
Cedric Chauve
plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
Frontiers in Microbiology
bioinformatics
machine learning (ML)
classification
plasmids
assembly graph
title plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
title_full plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
title_fullStr plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
title_full_unstemmed plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
title_short plASgraph2: using graph neural networks to detect plasmid contigs from an assembly graph
title_sort plasgraph2 using graph neural networks to detect plasmid contigs from an assembly graph
topic bioinformatics
machine learning (ML)
classification
plasmids
assembly graph
url https://www.frontiersin.org/articles/10.3389/fmicb.2023.1267695/full
work_keys_str_mv AT janiksielemann plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph
AT katharinasielemann plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph
AT bronabrejova plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph
AT tomasvinar plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph
AT cedricchauve plasgraph2usinggraphneuralnetworkstodetectplasmidcontigsfromanassemblygraph