Coupled encoding methods for antimicrobial peptide prediction: How sensitive is a highly accurate model?

Current application of machine learning in the process of antimicrobial peptide discovery call for the reduction of the false positive predictions that are produced by the classification models. Considering that the positive predictions of high confidence drive modern experimental design, the model’...

Full description

Bibliographic Details
Main Authors: Ivan Erjavac, Daniela Kalafatovic, Goran Mauša
Format: Article
Language:English
Published: Elsevier 2022-12-01
Series:Artificial Intelligence in the Life Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667318522000058
Description
Summary:Current application of machine learning in the process of antimicrobial peptide discovery call for the reduction of the false positive predictions that are produced by the classification models. Considering that the positive predictions of high confidence drive modern experimental design, the model’s sensitivity is crucial to reduce the number of unnecessary in vitro tests. Furthermore, taking into account the expert-based design approaches that employ random mutations on confirmed sequences, the machine learning models are required to distinguish between subtle differences among shuffled sequences. With the goal of reducing the false positive rate and improving sensitivity, we propose a hybrid approach to antimicrobial peptide prediction that utilizes combined encoding models. To this end, we implement models that employ both the physico-chemical features and sequence ordering information to stress the importance of using both representations. We also investigate the usage of binary encoding for peptide representation purposes, a method that is insufficiently represented in related research, which proved to act as a viable low dimensional alternative to the one-hot encoding. Our results, supported by Cochran and McNemar statistical tests and Spearman correlation analysis, indicate that the sequence-based encodings complement the physico-chemical features and their synergic effect yields improvement in terms of every evaluation metric. Finally, the proposed hybrid approach that combines physico-chemical features and binary encoding using logical conjunction was shown to be superior to other single models by a factor of 2.96 in terms of fall-out and up to 6.1% in terms of precision.
ISSN:2667-3185