Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases

Abstract Background The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular...

Full description

Bibliographic Details
Main Authors: Christina Brester, Jussi Kauhanen, Tomi-Pekka Tuomainen, Sari Voutilainen, Mauno Rönkkö, Kimmo Ronkainen, Eugene Semenkin, Mikko Kolehmainen
Format: Article
Language:English
Published: BMC 2018-08-01
Series:BioData Mining
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13040-018-0180-x
_version_ 1811242819252125696
author Christina Brester
Jussi Kauhanen
Tomi-Pekka Tuomainen
Sari Voutilainen
Mauno Rönkkö
Kimmo Ronkainen
Eugene Semenkin
Mikko Kolehmainen
author_facet Christina Brester
Jussi Kauhanen
Tomi-Pekka Tuomainen
Sari Voutilainen
Mauno Rönkkö
Kimmo Ronkainen
Eugene Semenkin
Mikko Kolehmainen
author_sort Christina Brester
collection DOAJ
description Abstract Background The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects. Results The effectiveness of variable selection methods was investigated in combination with two models: Generalized Linear Logistic Regression and Support Vector Machine. We managed to decrease the number of variables from 433 to 38 and save the predictive ability of the models used. Their performance was evaluated with an F-score metric. At most, we gained 65.6% and 67.4% of the F-score before and after variable selection respectively. All the results were averaged over 5-folds of a cross-validation procedure. Conclusions The presented evolutionary variable selection method allows a reduced set of variables to be chosen which are relevant to predicting cardiovascular diseases. A reference list of the most meaningful variables is introduced to be used as a basis for new epidemiological studies. In general, the multicollinearity of variables enables different combinations of predictors to be used and the same performance of models to be attained.
first_indexed 2024-04-12T13:56:43Z
format Article
id doaj.art-3972bf4e75d44e209f039a165bfff975
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-04-12T13:56:43Z
publishDate 2018-08-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-3972bf4e75d44e209f039a165bfff9752022-12-22T03:30:20ZengBMCBioData Mining1756-03812018-08-0111111410.1186/s13040-018-0180-xEvolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseasesChristina Brester0Jussi Kauhanen1Tomi-Pekka Tuomainen2Sari Voutilainen3Mauno Rönkkö4Kimmo Ronkainen5Eugene Semenkin6Mikko Kolehmainen7Department of Environmental and Biological Sciences, University of Eastern FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern FinlandDepartment of Environmental and Biological Sciences, University of Eastern FinlandInstitute of Public Health and Clinical Nutrition, University of Eastern FinlandInstitute of Computer Science and Telecommunications, Reshetnev Siberian State University of Science and TechnologyDepartment of Environmental and Biological Sciences, University of Eastern FinlandAbstract Background The redundancy of information is becoming a critical issue for epidemiologists. High-dimensional datasets require new effective variable selection methods to be developed. This study implements an advanced evolutionary variable selection method which is applied for cardiovascular predictive modeling. The epidemiological follow-up study KIHD (Kuopio Ischemic Heart Disease Risk Factor Study) was used to compare the designed variable selection method based on an evolutionary search with conventional stepwise selection. The sample contains in total 433 predictor variables and a response variable indicating incidents of cardiovascular diseases for 1465 study subjects. Results The effectiveness of variable selection methods was investigated in combination with two models: Generalized Linear Logistic Regression and Support Vector Machine. We managed to decrease the number of variables from 433 to 38 and save the predictive ability of the models used. Their performance was evaluated with an F-score metric. At most, we gained 65.6% and 67.4% of the F-score before and after variable selection respectively. All the results were averaged over 5-folds of a cross-validation procedure. Conclusions The presented evolutionary variable selection method allows a reduced set of variables to be chosen which are relevant to predicting cardiovascular diseases. A reference list of the most meaningful variables is introduced to be used as a basis for new epidemiological studies. In general, the multicollinearity of variables enables different combinations of predictors to be used and the same performance of models to be attained.http://link.springer.com/article/10.1186/s13040-018-0180-xVariable selectionCardiovascular diseasePredictive modelingKuopio ischemic heart disease risk factor study
spellingShingle Christina Brester
Jussi Kauhanen
Tomi-Pekka Tuomainen
Sari Voutilainen
Mauno Rönkkö
Kimmo Ronkainen
Eugene Semenkin
Mikko Kolehmainen
Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
BioData Mining
Variable selection
Cardiovascular disease
Predictive modeling
Kuopio ischemic heart disease risk factor study
title Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_full Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_fullStr Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_full_unstemmed Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_short Evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
title_sort evolutionary methods for variable selection in the epidemiological modeling of cardiovascular diseases
topic Variable selection
Cardiovascular disease
Predictive modeling
Kuopio ischemic heart disease risk factor study
url http://link.springer.com/article/10.1186/s13040-018-0180-x
work_keys_str_mv AT christinabrester evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT jussikauhanen evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT tomipekkatuomainen evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT sarivoutilainen evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT maunoronkko evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT kimmoronkainen evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT eugenesemenkin evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases
AT mikkokolehmainen evolutionarymethodsforvariableselectionintheepidemiologicalmodelingofcardiovasculardiseases