An Augmented Sample Selection Framework for Prediction of Anticancer Peptides
Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven p...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-09-01
|
Series: | Molecules |
Subjects: | |
Online Access: | https://www.mdpi.com/1420-3049/28/18/6680 |
_version_ | 1797578527867928576 |
---|---|
author | Huawei Tao Shuai Shan Hongliang Fu Chunhua Zhu Boye Liu |
author_facet | Huawei Tao Shuai Shan Hongliang Fu Chunhua Zhu Boye Liu |
author_sort | Huawei Tao |
collection | DOAJ |
description | Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method. |
first_indexed | 2024-03-10T22:24:03Z |
format | Article |
id | doaj.art-49dc32ecd1fc4c328b6327066ae5b667 |
institution | Directory Open Access Journal |
issn | 1420-3049 |
language | English |
last_indexed | 2024-03-10T22:24:03Z |
publishDate | 2023-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Molecules |
spelling | doaj.art-49dc32ecd1fc4c328b6327066ae5b6672023-11-19T12:10:54ZengMDPI AGMolecules1420-30492023-09-012818668010.3390/molecules28186680An Augmented Sample Selection Framework for Prediction of Anticancer PeptidesHuawei Tao0Shuai Shan1Hongliang Fu2Chunhua Zhu3Boye Liu4Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaKey Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaKey Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaKey Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, ChinaCollege of Food Science and Engineering, Henan University of Technology, Zhengzhou 450001, ChinaAnticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.https://www.mdpi.com/1420-3049/28/18/6680anticancer peptidesprediction modeldata augmentationnoisy samplesuncertainty estimationconfidence |
spellingShingle | Huawei Tao Shuai Shan Hongliang Fu Chunhua Zhu Boye Liu An Augmented Sample Selection Framework for Prediction of Anticancer Peptides Molecules anticancer peptides prediction model data augmentation noisy samples uncertainty estimation confidence |
title | An Augmented Sample Selection Framework for Prediction of Anticancer Peptides |
title_full | An Augmented Sample Selection Framework for Prediction of Anticancer Peptides |
title_fullStr | An Augmented Sample Selection Framework for Prediction of Anticancer Peptides |
title_full_unstemmed | An Augmented Sample Selection Framework for Prediction of Anticancer Peptides |
title_short | An Augmented Sample Selection Framework for Prediction of Anticancer Peptides |
title_sort | augmented sample selection framework for prediction of anticancer peptides |
topic | anticancer peptides prediction model data augmentation noisy samples uncertainty estimation confidence |
url | https://www.mdpi.com/1420-3049/28/18/6680 |
work_keys_str_mv | AT huaweitao anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides AT shuaishan anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides AT hongliangfu anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides AT chunhuazhu anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides AT boyeliu anaugmentedsampleselectionframeworkforpredictionofanticancerpeptides AT huaweitao augmentedsampleselectionframeworkforpredictionofanticancerpeptides AT shuaishan augmentedsampleselectionframeworkforpredictionofanticancerpeptides AT hongliangfu augmentedsampleselectionframeworkforpredictionofanticancerpeptides AT chunhuazhu augmentedsampleselectionframeworkforpredictionofanticancerpeptides AT boyeliu augmentedsampleselectionframeworkforpredictionofanticancerpeptides |