A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins

Traditional food allergen identification mainly relies on in vivo and in vitro experiments, which often needs a long period and high cost. The artificial intelligence (AI)-driven rapid food allergen identification method has solved the above mentioned some drawbacks and is becoming an efficient auxi...

Full description

Bibliographic Details
Main Authors: Liyang Wang, Dantong Niu, Xinjie Zhao, Xiaoya Wang, Mengzhen Hao, Huilian Che
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Foods
Subjects:
Online Access:https://www.mdpi.com/2304-8158/10/4/809
_version_ 1827695542173433856
author Liyang Wang
Dantong Niu
Xinjie Zhao
Xiaoya Wang
Mengzhen Hao
Huilian Che
author_facet Liyang Wang
Dantong Niu
Xinjie Zhao
Xiaoya Wang
Mengzhen Hao
Huilian Che
author_sort Liyang Wang
collection DOAJ
description Traditional food allergen identification mainly relies on in vivo and in vitro experiments, which often needs a long period and high cost. The artificial intelligence (AI)-driven rapid food allergen identification method has solved the above mentioned some drawbacks and is becoming an efficient auxiliary tool. Aiming to overcome the limitations of lower accuracy of traditional machine learning models in predicting the allergenicity of food proteins, this work proposed to introduce deep learning model—transformer with self-attention mechanism, ensemble learning models (representative as Light Gradient Boosting Machine (LightGBM) eXtreme Gradient Boosting (XGBoost)) to solve the problem. In order to highlight the superiority of the proposed novel method, the study also selected various commonly used machine learning models as the baseline classifiers. The results of 5-fold cross-validation showed that the area under the receiver operating characteristic curve (AUC) of the deep model was the highest (0.9578), which was better than the ensemble learning and baseline algorithms. But the deep model need to be pre-trained, and the training time is the longest. By comparing the characteristics of the transformer model and boosting models, it can be analyzed that, each model has its own advantage, which provides novel clues and inspiration for the rapid prediction of food allergens in the future.
first_indexed 2024-03-10T12:28:42Z
format Article
id doaj.art-1c25e8192efe4dd99e3d2d9223c44bcb
institution Directory Open Access Journal
issn 2304-8158
language English
last_indexed 2024-03-10T12:28:42Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Foods
spelling doaj.art-1c25e8192efe4dd99e3d2d9223c44bcb2023-11-21T14:47:00ZengMDPI AGFoods2304-81582021-04-0110480910.3390/foods10040809A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food ProteinsLiyang Wang0Dantong Niu1Xinjie Zhao2Xiaoya Wang3Mengzhen Hao4Huilian Che5Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, ChinaCollege of Information and Electrical Engineering, China Agricultural University, Beijing 100083, ChinaCollege of Humanities and Development Studies, China Agricultural University, Beijing 100083, ChinaKey Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, ChinaKey Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, ChinaKey Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, ChinaTraditional food allergen identification mainly relies on in vivo and in vitro experiments, which often needs a long period and high cost. The artificial intelligence (AI)-driven rapid food allergen identification method has solved the above mentioned some drawbacks and is becoming an efficient auxiliary tool. Aiming to overcome the limitations of lower accuracy of traditional machine learning models in predicting the allergenicity of food proteins, this work proposed to introduce deep learning model—transformer with self-attention mechanism, ensemble learning models (representative as Light Gradient Boosting Machine (LightGBM) eXtreme Gradient Boosting (XGBoost)) to solve the problem. In order to highlight the superiority of the proposed novel method, the study also selected various commonly used machine learning models as the baseline classifiers. The results of 5-fold cross-validation showed that the area under the receiver operating characteristic curve (AUC) of the deep model was the highest (0.9578), which was better than the ensemble learning and baseline algorithms. But the deep model need to be pre-trained, and the training time is the longest. By comparing the characteristics of the transformer model and boosting models, it can be analyzed that, each model has its own advantage, which provides novel clues and inspiration for the rapid prediction of food allergens in the future.https://www.mdpi.com/2304-8158/10/4/809food allergensallergenicity predictiondeep learningensemble learningcomparative analysis
spellingShingle Liyang Wang
Dantong Niu
Xinjie Zhao
Xiaoya Wang
Mengzhen Hao
Huilian Che
A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins
Foods
food allergens
allergenicity prediction
deep learning
ensemble learning
comparative analysis
title A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins
title_full A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins
title_fullStr A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins
title_full_unstemmed A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins
title_short A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins
title_sort comparative analysis of novel deep learning and ensemble learning models to predict the allergenicity of food proteins
topic food allergens
allergenicity prediction
deep learning
ensemble learning
comparative analysis
url https://www.mdpi.com/2304-8158/10/4/809
work_keys_str_mv AT liyangwang acomparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT dantongniu acomparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT xinjiezhao acomparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT xiaoyawang acomparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT mengzhenhao acomparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT huilianche acomparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT liyangwang comparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT dantongniu comparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT xinjiezhao comparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT xiaoyawang comparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT mengzhenhao comparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins
AT huilianche comparativeanalysisofnoveldeeplearningandensemblelearningmodelstopredicttheallergenicityoffoodproteins