Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)

Abstract Background Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performa...

Full description

Bibliographic Details
Main Authors:	Yongjun Choi, Junho Cha, Sungkyoung Choi
Format:	Article
Language:	English
Published:	BMC 2024-02-01
Series:	BMC Bioinformatics
Subjects:	Disease risk prediction model Large-scale genetic data Asthma Penalized methods Machine learning methods Ensemble methods
Online Access:	https://doi.org/10.1186/s12859-024-05677-x

_version_	1797273062234652672
author	Yongjun Choi Junho Cha Sungkyoung Choi
author_facet	Yongjun Choi Junho Cha Sungkyoung Choi
author_sort	Yongjun Choi
collection	DOAJ
description	Abstract Background Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). Results First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen′s Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. Conclusions Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.
first_indexed	2024-03-07T14:38:11Z
format	Article
id	doaj.art-cbfe60f1785c402bae0e11d76f409ba7
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-03-07T14:38:11Z
publishDate	2024-02-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-cbfe60f1785c402bae0e11d76f409ba72024-03-05T20:31:47ZengBMCBMC Bioinformatics1471-21052024-02-0125112710.1186/s12859-024-05677-xEvaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)Yongjun Choi0Junho Cha1Sungkyoung Choi2Department of Applied Artificial Intelligence, College of Computing, Hanyang UniversityDepartment of Applied Artificial Intelligence, College of Computing, Hanyang UniversityDepartment of Applied Artificial Intelligence, College of Computing, Hanyang UniversityAbstract Background Genome-wide association studies have successfully identified genetic variants associated with human disease. Various statistical approaches based on penalized and machine learning methods have recently been proposed for disease prediction. In this study, we evaluated the performance of several such methods for predicting asthma using the Korean Chip (KORV1.1) from the Korean Genome and Epidemiology Study (KoGES). Results First, single-nucleotide polymorphisms were selected via single-variant tests using logistic regression with the adjustment of several epidemiological factors. Next, we evaluated the following methods for disease prediction: ridge, least absolute shrinkage and selection operator, elastic net, smoothly clipped absolute deviation, support vector machine, random forest, boosting, bagging, naïve Bayes, and k-nearest neighbor. Finally, we compared their predictive performance based on the area under the curve of the receiver operating characteristic curves, precision, recall, F1-score, Cohen′s Kappa, balanced accuracy, error rate, Matthews correlation coefficient, and area under the precision-recall curve. Additionally, three oversampling algorithms are used to deal with imbalance problems. Conclusions Our results show that penalized methods exhibit better predictive performance for asthma than that achieved via machine learning methods. On the other hand, in the oversampling study, randomforest and boosting methods overall showed better prediction performance than penalized methods.https://doi.org/10.1186/s12859-024-05677-xDisease risk prediction modelLarge-scale genetic dataAsthmaPenalized methodsMachine learning methodsEnsemble methods
spellingShingle	Yongjun Choi Junho Cha Sungkyoung Choi Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES) BMC Bioinformatics Disease risk prediction model Large-scale genetic data Asthma Penalized methods Machine learning methods Ensemble methods
title	Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)
title_full	Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)
title_fullStr	Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)
title_full_unstemmed	Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)
title_short	Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)
title_sort	evaluation of penalized and machine learning methods for asthma disease prediction in the korean genome and epidemiology study koges
topic	Disease risk prediction model Large-scale genetic data Asthma Penalized methods Machine learning methods Ensemble methods
url	https://doi.org/10.1186/s12859-024-05677-x
work_keys_str_mv	AT yongjunchoi evaluationofpenalizedandmachinelearningmethodsforasthmadiseasepredictioninthekoreangenomeandepidemiologystudykoges AT junhocha evaluationofpenalizedandmachinelearningmethodsforasthmadiseasepredictioninthekoreangenomeandepidemiologystudykoges AT sungkyoungchoi evaluationofpenalizedandmachinelearningmethodsforasthmadiseasepredictioninthekoreangenomeandepidemiologystudykoges

Evaluation of penalized and machine learning methods for asthma disease prediction in the Korean Genome and Epidemiology Study (KoGES)

Similar Items