Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning

Health-related data has a decisive role in disease diagnosis. Collecting relevant information from health-related data in medical records has been facilitated by evaluating the features of the data. Relevant research has shown that outcomes are significantly impacted by the use of feature selection...

Full description

Bibliographic Details
Main Authors: Annisa Darmawahyuni, Siti Nurmaini, Bambang Tutuko, Muhammad Naufal Rachmatullah, Firdaus Firdaus, Ade Iriani Sapitri, Anggun Islami, Jordan Marcelino, Rendy Isdwanta, Muhammad Irfan Karim
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10500837/
_version_ 1797193545117859840
author Annisa Darmawahyuni
Siti Nurmaini
Bambang Tutuko
Muhammad Naufal Rachmatullah
Firdaus Firdaus
Ade Iriani Sapitri
Anggun Islami
Jordan Marcelino
Rendy Isdwanta
Muhammad Irfan Karim
author_facet Annisa Darmawahyuni
Siti Nurmaini
Bambang Tutuko
Muhammad Naufal Rachmatullah
Firdaus Firdaus
Ade Iriani Sapitri
Anggun Islami
Jordan Marcelino
Rendy Isdwanta
Muhammad Irfan Karim
author_sort Annisa Darmawahyuni
collection DOAJ
description Health-related data has a decisive role in disease diagnosis. Collecting relevant information from health-related data in medical records has been facilitated by evaluating the features of the data. Relevant research has shown that outcomes are significantly impacted by the use of feature selection (FS) in different medical domain data. FS provides an analysis of the most significant features to improve classification accuracy. The FS technique aims at minimizing the number of input variables and computational overload to maximize classification performance results. However, identifying the optimal features poses issues due to the high dimensionality of large features and the small sample size of health-related data. The metaheuristics optimization algorithm (MOA) plays an important role in generating the best subset features with exploration and exploitation phases. This study experiments with well-known MOAs and supervised learning from the UC Irvine Machine Learning Repository, PhysioNet, Kent Ridge Bio-Medical Dataset, and MIMIC-III v1.4 Repository with varying feature dimensions. To increase the quality of health-related data, this study proposes missing data imputation based on a deep learning approach, an autoencoder (AE). With AE imputation, the performance results obtain 0.0167 mean squared error (MSE) and 0.129 root mean squared error (RMSE). As a result, MOA shows its excellence in achieving minimal features, but still outstanding performance in low- and high-dimensional data. MOA is successfully applied to varying diverse health-related datasets with low- and high-dimensional data.
first_indexed 2024-04-24T05:42:05Z
format Article
id doaj.art-e048322e5f854259bb76fc935ab175f7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-24T05:42:05Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e048322e5f854259bb76fc935ab175f72024-04-23T23:00:29ZengIEEEIEEE Access2169-35362024-01-0112553425535610.1109/ACCESS.2024.339000810500837Health-Related Data Analysis Using Metaheuristic Optimization and Machine LearningAnnisa Darmawahyuni0https://orcid.org/0000-0002-0229-5717Siti Nurmaini1https://orcid.org/0000-0002-8024-2952Bambang Tutuko2https://orcid.org/0000-0002-2051-8988Muhammad Naufal Rachmatullah3https://orcid.org/0000-0003-3553-3475Firdaus Firdaus4https://orcid.org/0000-0003-2791-3486Ade Iriani Sapitri5Anggun Islami6Jordan Marcelino7https://orcid.org/0009-0002-2499-396XRendy Isdwanta8https://orcid.org/0009-0009-2956-6175Muhammad Irfan Karim9Faculty of Engineering, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaIntelligent System Research Group, Universitas Sriwijaya, Palembang, IndonesiaHealth-related data has a decisive role in disease diagnosis. Collecting relevant information from health-related data in medical records has been facilitated by evaluating the features of the data. Relevant research has shown that outcomes are significantly impacted by the use of feature selection (FS) in different medical domain data. FS provides an analysis of the most significant features to improve classification accuracy. The FS technique aims at minimizing the number of input variables and computational overload to maximize classification performance results. However, identifying the optimal features poses issues due to the high dimensionality of large features and the small sample size of health-related data. The metaheuristics optimization algorithm (MOA) plays an important role in generating the best subset features with exploration and exploitation phases. This study experiments with well-known MOAs and supervised learning from the UC Irvine Machine Learning Repository, PhysioNet, Kent Ridge Bio-Medical Dataset, and MIMIC-III v1.4 Repository with varying feature dimensions. To increase the quality of health-related data, this study proposes missing data imputation based on a deep learning approach, an autoencoder (AE). With AE imputation, the performance results obtain 0.0167 mean squared error (MSE) and 0.129 root mean squared error (RMSE). As a result, MOA shows its excellence in achieving minimal features, but still outstanding performance in low- and high-dimensional data. MOA is successfully applied to varying diverse health-related datasets with low- and high-dimensional data.https://ieeexplore.ieee.org/document/10500837/Autoencoderclassificationdata imputationfeature selectionhealth-related datasetmetaheuristic algorithms
spellingShingle Annisa Darmawahyuni
Siti Nurmaini
Bambang Tutuko
Muhammad Naufal Rachmatullah
Firdaus Firdaus
Ade Iriani Sapitri
Anggun Islami
Jordan Marcelino
Rendy Isdwanta
Muhammad Irfan Karim
Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning
IEEE Access
Autoencoder
classification
data imputation
feature selection
health-related dataset
metaheuristic algorithms
title Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning
title_full Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning
title_fullStr Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning
title_full_unstemmed Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning
title_short Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning
title_sort health related data analysis using metaheuristic optimization and machine learning
topic Autoencoder
classification
data imputation
feature selection
health-related dataset
metaheuristic algorithms
url https://ieeexplore.ieee.org/document/10500837/
work_keys_str_mv AT annisadarmawahyuni healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT sitinurmaini healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT bambangtutuko healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT muhammadnaufalrachmatullah healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT firdausfirdaus healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT adeirianisapitri healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT anggunislami healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT jordanmarcelino healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT rendyisdwanta healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning
AT muhammadirfankarim healthrelateddataanalysisusingmetaheuristicoptimizationandmachinelearning