An Augmented Sample Selection Framework for Prediction of Anticancer Peptides

Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven p...

Full description

Bibliographic Details
Main Authors: Huawei Tao, Shuai Shan, Hongliang Fu, Chunhua Zhu, Boye Liu
Format: Article
Language:English
Published: MDPI AG 2023-09-01
Series:Molecules
Subjects:
Online Access:https://www.mdpi.com/1420-3049/28/18/6680
_version_ 1797578527867928576
author Huawei Tao
Shuai Shan
Hongliang Fu
Chunhua Zhu
Boye Liu
author_facet Huawei Tao
Shuai Shan
Hongliang Fu
Chunhua Zhu
Boye Liu
author_sort Huawei Tao
collection DOAJ
description Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.
first_indexed 2024-03-10T22:24:03Z
format Article
id doaj.art-49dc32ecd1fc4c328b6327066ae5b667
institution Directory Open Access Journal
issn 1420-3049
language English
last_indexed 2024-03-10T22:24:03Z
publishDate 2023-09-01
publisher MDPI AG
record_format Article
series Molecules
spelling doaj.art-49dc32ecd1fc4c328b6327066ae5b6672023-11-19T12:10:54ZengMDPI AGMolecules1420-30492023-09-012818668010.3390/molecules28186680An Augmented Sample Selection Framework for Prediction of Anticancer PeptidesHuawei Tao0Shuai Shan1Hongliang Fu2Chunhua Zhu3Boye Liu4Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaKey Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaKey Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaKey Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Food Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaAnticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.https://www.mdpi.com/1420-3049/28/18/6680anticancer peptidesprediction modeldata augmentationnoisy samplesuncertainty estimationconfidence
spellingShingle Huawei Tao
Shuai Shan
Hongliang Fu
Chunhua Zhu
Boye Liu
An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
Molecules
anticancer peptides
prediction model
data augmentation
noisy samples
uncertainty estimation
confidence
title An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
title_full An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
title_fullStr An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
title_full_unstemmed An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
title_short An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
title_sort augmented sample selection framework for prediction of anticancer peptides
topic anticancer peptides
prediction model
data augmentation
noisy samples
uncertainty estimation
confidence
url https://www.mdpi.com/1420-3049/28/18/6680
work_keys_str_mv AT huaweitao anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT shuaishan anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT hongliangfu anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT chunhuazhu anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT boyeliu anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT huaweitao augmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT shuaishan augmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT hongliangfu augmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT chunhuazhu augmentedsampleselectionframeworkforpredictionofanticancerpeptides
AT boyeliu augmentedsampleselectionframeworkforpredictionofanticancerpeptides