MLACP 2.0: An updated machine learning tool for anticancer peptide prediction

Anticancer peptides are emerging anticancer drug that offers fewer side effects and is more effective than chemotherapy and targeted therapy. Predicting anticancer peptides from sequence information is one of the most challenging tasks in immunoinformatics. In the past ten years, machine learning-ba...

Full description

Bibliographic Details
Main Authors: Le Thi Phan, Hyun Woo Park, Thejkiran Pitti, Thirumurthy Madhavan, Young-Jun Jeon, Balachandran Manavalan
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037022003245
_version_ 1797978202156564480
author Le Thi Phan
Hyun Woo Park
Thejkiran Pitti
Thirumurthy Madhavan
Young-Jun Jeon
Balachandran Manavalan
author_facet Le Thi Phan
Hyun Woo Park
Thejkiran Pitti
Thirumurthy Madhavan
Young-Jun Jeon
Balachandran Manavalan
author_sort Le Thi Phan
collection DOAJ
description Anticancer peptides are emerging anticancer drug that offers fewer side effects and is more effective than chemotherapy and targeted therapy. Predicting anticancer peptides from sequence information is one of the most challenging tasks in immunoinformatics. In the past ten years, machine learning-based approaches have been proposed for identifying ACP activity from peptide sequences. These methods include our previous method MLACP (developed in 2017) which made a significant impact on anticancer research. MLACP tool has been widely used by the research community, however, its robustness must be improved significantly for its continued practical application. In this study, the first large non-redundant training and independent datasets were constructed for ACP research. Using the training dataset, the study explored a wide range of feature encodings and developed their respective models using seven different conventional classifiers. Subsequently, a subset of encoding-based models was selected for each classifier based on their performance, whose predicted scores were concatenated and trained through a convolutional neural network (CNN), whose corresponding predictor is named MLACP 2.0. The evaluation of MLACP 2.0 with a very diverse independent dataset showed excellent performance and significantly outperformed the recent ACP prediction tools. Additionally, MLACP 2.0 exhibits superior performance during cross-validation and independent assessment when compared to CNN-based embedding models and conventional single models. Consequently, we anticipate that our proposed MLACP 2.0 will facilitate the design of hypothesis-driven experiments by making it easier to discover novel ACPs. The MLACP 2.0 is freely available at https://balalab-skku.org/mlacp2.
first_indexed 2024-04-11T05:19:12Z
format Article
id doaj.art-1ee053f896bb4bd88d6579394bee713a
institution Directory Open Access Journal
issn 2001-0370
language English
last_indexed 2024-04-11T05:19:12Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Computational and Structural Biotechnology Journal
spelling doaj.art-1ee053f896bb4bd88d6579394bee713a2022-12-24T04:53:39ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012044734480MLACP 2.0: An updated machine learning tool for anticancer peptide predictionLe Thi Phan0Hyun Woo Park1Thejkiran Pitti2Thirumurthy Madhavan3Young-Jun Jeon4Balachandran Manavalan5Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of KoreaDepartment of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of KoreaComputational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of KoreaComputational Biology Lab, Department of Genetic Engineering, SRM Institute of Science & Technology, Kattankulathur 603203, Tamil Nadu, IndiaDepartment of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea; Corresponding authors.Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea; Corresponding authors.Anticancer peptides are emerging anticancer drug that offers fewer side effects and is more effective than chemotherapy and targeted therapy. Predicting anticancer peptides from sequence information is one of the most challenging tasks in immunoinformatics. In the past ten years, machine learning-based approaches have been proposed for identifying ACP activity from peptide sequences. These methods include our previous method MLACP (developed in 2017) which made a significant impact on anticancer research. MLACP tool has been widely used by the research community, however, its robustness must be improved significantly for its continued practical application. In this study, the first large non-redundant training and independent datasets were constructed for ACP research. Using the training dataset, the study explored a wide range of feature encodings and developed their respective models using seven different conventional classifiers. Subsequently, a subset of encoding-based models was selected for each classifier based on their performance, whose predicted scores were concatenated and trained through a convolutional neural network (CNN), whose corresponding predictor is named MLACP 2.0. The evaluation of MLACP 2.0 with a very diverse independent dataset showed excellent performance and significantly outperformed the recent ACP prediction tools. Additionally, MLACP 2.0 exhibits superior performance during cross-validation and independent assessment when compared to CNN-based embedding models and conventional single models. Consequently, we anticipate that our proposed MLACP 2.0 will facilitate the design of hypothesis-driven experiments by making it easier to discover novel ACPs. The MLACP 2.0 is freely available at https://balalab-skku.org/mlacp2.http://www.sciencedirect.com/science/article/pii/S2001037022003245Anticancer peptidesConvolutional neural networkFeature encodingsConventional classifiersBaseline modelsDataset construction
spellingShingle Le Thi Phan
Hyun Woo Park
Thejkiran Pitti
Thirumurthy Madhavan
Young-Jun Jeon
Balachandran Manavalan
MLACP 2.0: An updated machine learning tool for anticancer peptide prediction
Computational and Structural Biotechnology Journal
Anticancer peptides
Convolutional neural network
Feature encodings
Conventional classifiers
Baseline models
Dataset construction
title MLACP 2.0: An updated machine learning tool for anticancer peptide prediction
title_full MLACP 2.0: An updated machine learning tool for anticancer peptide prediction
title_fullStr MLACP 2.0: An updated machine learning tool for anticancer peptide prediction
title_full_unstemmed MLACP 2.0: An updated machine learning tool for anticancer peptide prediction
title_short MLACP 2.0: An updated machine learning tool for anticancer peptide prediction
title_sort mlacp 2 0 an updated machine learning tool for anticancer peptide prediction
topic Anticancer peptides
Convolutional neural network
Feature encodings
Conventional classifiers
Baseline models
Dataset construction
url http://www.sciencedirect.com/science/article/pii/S2001037022003245
work_keys_str_mv AT lethiphan mlacp20anupdatedmachinelearningtoolforanticancerpeptideprediction
AT hyunwoopark mlacp20anupdatedmachinelearningtoolforanticancerpeptideprediction
AT thejkiranpitti mlacp20anupdatedmachinelearningtoolforanticancerpeptideprediction
AT thirumurthymadhavan mlacp20anupdatedmachinelearningtoolforanticancerpeptideprediction
AT youngjunjeon mlacp20anupdatedmachinelearningtoolforanticancerpeptideprediction
AT balachandranmanavalan mlacp20anupdatedmachinelearningtoolforanticancerpeptideprediction