Unsupervised encoding selection through ensemble pruning for biomedical classification

Abstract Background Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers...

Full description

Bibliographic Details
Main Authors:	Sebastian Spänig, Alexander Michel, Dominik Heider
Format:	Article
Language:	English
Published:	BMC 2023-03-01
Series:	BioData Mining
Subjects:	Biomedical classification Antimicrobial peptides Encodings Machine learning Ensemble learning
Online Access:	https://doi.org/10.1186/s13040-022-00317-7

_version_	1797865382413860864
author	Sebastian Spänig Alexander Michel Dominik Heider
author_facet	Sebastian Spänig Alexander Michel Dominik Heider
author_sort	Sebastian Spänig
collection	DOAJ
description	Abstract Background Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers leverage artificial intelligence to circumvent the cumbersome, wet-lab-based identification and automate the detection of promising candidates. However, the prediction of a peptide’s function is not limited to antimicrobial efficiency. To date, multiple studies successfully classified additional properties, e.g., antiviral or cell-penetrating effects. In this light, ensemble classifiers are employed aiming to further improve the prediction. Although we recently presented a workflow to significantly diminish the initial encoding choice, an entire unsupervised encoding selection, considering various machine learning models, is still lacking. Results We developed a workflow, automatically selecting encodings and generating classifier ensembles by employing sophisticated pruning methods. We observed that the Pareto frontier pruning is a good method to create encoding ensembles for the datasets at hand. In addition, encodings combined with the Decision Tree classifier as the base model are often superior. However, our results also demonstrate that none of the ensemble building techniques is outstanding for all datasets. Conclusion The workflow conducts multiple pruning methods to evaluate ensemble classifiers composed from a wide range of peptide encodings and base models. Consequently, researchers can use the workflow for unsupervised encoding selection and ensemble creation. Ultimately, the extensible workflow can be used as a plugin for the PEPTIDE REACToR, further establishing it as a versatile tool in the domain.
first_indexed	2024-04-09T23:07:06Z
format	Article
id	doaj.art-1d007c3a0bd4446dbd9986c86630713d
institution	Directory Open Access Journal
issn	1756-0381
language	English
last_indexed	2024-04-09T23:07:06Z
publishDate	2023-03-01
publisher	BMC
record_format	Article
series	BioData Mining
spelling	doaj.art-1d007c3a0bd4446dbd9986c86630713d2023-03-22T10:35:34ZengBMCBioData Mining1756-03812023-03-0116112010.1186/s13040-022-00317-7Unsupervised encoding selection through ensemble pruning for biomedical classificationSebastian Spänig0Alexander Michel1Dominik Heider2Data Science in Biomedicine, Department of Mathematics and Computer Science, University of MarburgData Science in Biomedicine, Department of Mathematics and Computer Science, University of MarburgData Science in Biomedicine, Department of Mathematics and Computer Science, University of MarburgAbstract Background Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers leverage artificial intelligence to circumvent the cumbersome, wet-lab-based identification and automate the detection of promising candidates. However, the prediction of a peptide’s function is not limited to antimicrobial efficiency. To date, multiple studies successfully classified additional properties, e.g., antiviral or cell-penetrating effects. In this light, ensemble classifiers are employed aiming to further improve the prediction. Although we recently presented a workflow to significantly diminish the initial encoding choice, an entire unsupervised encoding selection, considering various machine learning models, is still lacking. Results We developed a workflow, automatically selecting encodings and generating classifier ensembles by employing sophisticated pruning methods. We observed that the Pareto frontier pruning is a good method to create encoding ensembles for the datasets at hand. In addition, encodings combined with the Decision Tree classifier as the base model are often superior. However, our results also demonstrate that none of the ensemble building techniques is outstanding for all datasets. Conclusion The workflow conducts multiple pruning methods to evaluate ensemble classifiers composed from a wide range of peptide encodings and base models. Consequently, researchers can use the workflow for unsupervised encoding selection and ensemble creation. Ultimately, the extensible workflow can be used as a plugin for the PEPTIDE REACToR, further establishing it as a versatile tool in the domain.https://doi.org/10.1186/s13040-022-00317-7Biomedical classificationAntimicrobial peptidesEncodingsMachine learningEnsemble learning
spellingShingle	Sebastian Spänig Alexander Michel Dominik Heider Unsupervised encoding selection through ensemble pruning for biomedical classification BioData Mining Biomedical classification Antimicrobial peptides Encodings Machine learning Ensemble learning
title	Unsupervised encoding selection through ensemble pruning for biomedical classification
title_full	Unsupervised encoding selection through ensemble pruning for biomedical classification
title_fullStr	Unsupervised encoding selection through ensemble pruning for biomedical classification
title_full_unstemmed	Unsupervised encoding selection through ensemble pruning for biomedical classification
title_short	Unsupervised encoding selection through ensemble pruning for biomedical classification
title_sort	unsupervised encoding selection through ensemble pruning for biomedical classification
topic	Biomedical classification Antimicrobial peptides Encodings Machine learning Ensemble learning
url	https://doi.org/10.1186/s13040-022-00317-7
work_keys_str_mv	AT sebastianspanig unsupervisedencodingselectionthroughensemblepruningforbiomedicalclassification AT alexandermichel unsupervisedencodingselectionthroughensemblepruningforbiomedicalclassification AT dominikheider unsupervisedencodingselectionthroughensemblepruningforbiomedicalclassification

Unsupervised encoding selection through ensemble pruning for biomedical classification

Similar Items