SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information

Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researche...

Full description

Bibliographic Details
Main Authors: Adeel Malik, Sathiyamoorthy Subramaniyam, Chang-Bae Kim, Balachandran Manavalan
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021005237
_version_ 1797978288085270528
author Adeel Malik
Sathiyamoorthy Subramaniyam
Chang-Bae Kim
Balachandran Manavalan
author_facet Adeel Malik
Sathiyamoorthy Subramaniyam
Chang-Bae Kim
Balachandran Manavalan
author_sort Adeel Malik
collection DOAJ
description Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researchers have focused on the development of sortase inhibitors. Currently, six different classes of sortases (A-F) are recognized. However, with the extensive application of bacterial genome sequencing projects, the number of potential sortases in the public databases has exploded, presenting considerable challenges in annotating these sequences. It is very laborious and time-consuming to characterize these sortase classes experimentally. Therefore, this study developed the first machine-learning-based two-layer predictor called SortPred, where the first layer predicts the sortase from the given sequence and the second layer predicts their class from the predicted sortase. To develop SortPred, we constructed an original benchmarking dataset and investigated 31 feature descriptors, primarily on five feature encoding algorithms. Afterward, each of these descriptors were trained using a random forest classifier and their robustness was evaluated with an independent dataset. Finally, we selected the final model independently for both layers depending on the performance consistency between cross-validation and independent evaluation. SortPred is expected to be an effective tool for identifying bacterial sortases, which in turn may aid in designing sortase inhibitors and exploring their functions. The SortPred webserver and a standalone version are freely accessible at: https://procarb.org/sortpred.
first_indexed 2024-04-11T05:20:24Z
format Article
id doaj.art-69c1e666e3864fc6840c910707438835
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:20:24Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-69c1e666e3864fc6840c9107074388352022-12-24T04:50:57ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-0120165174SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived informationAdeel Malik0Sathiyamoorthy Subramaniyam1Chang-Bae Kim2Balachandran Manavalan3Institute of Intelligence Informatics Technology, Sangmyung University, Seoul 03016, Republic of KoreaResearch and Development Center, Insilicogen Inc., Yongin-si 16954, Gyeonggi-do, Republic of KoreaDepartment of Biotechnology, Sangmyung University, Seoul 03016, Republic of Korea; Corresponding authors.Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea; Corresponding authors.Sortase enzymes are cysteine transpeptidases that embellish the surface of Gram-positive bacteria with various proteins thereby allowing these microorganisms to interact with their neighboring environment. It is known that several of their substrates can cause pathological implications, so researchers have focused on the development of sortase inhibitors. Currently, six different classes of sortases (A-F) are recognized. However, with the extensive application of bacterial genome sequencing projects, the number of potential sortases in the public databases has exploded, presenting considerable challenges in annotating these sequences. It is very laborious and time-consuming to characterize these sortase classes experimentally. Therefore, this study developed the first machine-learning-based two-layer predictor called SortPred, where the first layer predicts the sortase from the given sequence and the second layer predicts their class from the predicted sortase. To develop SortPred, we constructed an original benchmarking dataset and investigated 31 feature descriptors, primarily on five feature encoding algorithms. Afterward, each of these descriptors were trained using a random forest classifier and their robustness was evaluated with an independent dataset. Finally, we selected the final model independently for both layers depending on the performance consistency between cross-validation and independent evaluation. SortPred is expected to be an effective tool for identifying bacterial sortases, which in turn may aid in designing sortase inhibitors and exploring their functions. The SortPred webserver and a standalone version are freely accessible at: https://procarb.org/sortpred.http://www.sciencedirect.com/science/article/pii/S2001037021005237SortaseMachine learningRandom forestCysteine transpeptidaseHybrid featuresBioinformatics
spellingShingle Adeel Malik
Sathiyamoorthy Subramaniyam
Chang-Bae Kim
Balachandran Manavalan
SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
Computational and Structural Biotechnology Journal
Sortase
Machine learning
Random forest
Cysteine transpeptidase
Hybrid features
Bioinformatics
title SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_full SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_fullStr SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_full_unstemmed SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_short SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information
title_sort sortpred the first machine learning based predictor to identify bacterial sortases and their classes using sequence derived information
topic Sortase
Machine learning
Random forest
Cysteine transpeptidase
Hybrid features
Bioinformatics
url http://www.sciencedirect.com/science/article/pii/S2001037021005237
work_keys_str_mv AT adeelmalik sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation
AT sathiyamoorthysubramaniyam sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation
AT changbaekim sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation
AT balachandranmanavalan sortpredthefirstmachinelearningbasedpredictortoidentifybacterialsortasesandtheirclassesusingsequencederivedinformation