Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease

There has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment...

Full description

Bibliographic Details
Main Authors: Abdul Quadir Md, Sanika Kulkarni, Christy Jackson Joshua, Tejas Vaichole, Senthilkumar Mohan, Celestine Iwendi
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Biomedicines
Subjects:
Online Access:https://www.mdpi.com/2227-9059/11/2/581
_version_ 1797622163547619328
author Abdul Quadir Md
Sanika Kulkarni
Christy Jackson Joshua
Tejas Vaichole
Senthilkumar Mohan
Celestine Iwendi
author_facet Abdul Quadir Md
Sanika Kulkarni
Christy Jackson Joshua
Tejas Vaichole
Senthilkumar Mohan
Celestine Iwendi
author_sort Abdul Quadir Md
collection DOAJ
description There has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment earlier, thereby saving their lives. It has become increasingly popular to use ensemble learning algorithms since they perform better than traditional machine learning algorithms. In this context, this paper proposes a novel architecture based on ensemble learning and enhanced preprocessing to predict liver disease using the Indian Liver Patient Dataset (ILPD). Six ensemble learning algorithms are applied to the ILPD, and their results are compared to those obtained with existing studies. The proposed model uses several data preprocessing methods, such as data balancing, feature scaling, and feature selection, to improve the accuracy with appropriate imputations. Multivariate imputation is applied to fill in missing values. On skewed columns, log1p transformation was applied, along with standardization, min–max scaling, maximum absolute scaling, and robust scaling techniques. The selection of features is carried out based on several methods including univariate selection, feature importance, and correlation matrix. These enhanced preprocessed data are trained on Gradient boosting, XGBoost, Bagging, Random Forest, Extra Tree, and Stacking ensemble learning algorithms. The results of the six models were compared with each other, as well as with the models used in other research works. The proposed model using extra tree classifier and random forest, outperformed the other methods with the highest testing accuracy of 91.82% and 86.06%, respectively, portraying our method as a real-world solution for detecting liver disease.
first_indexed 2024-03-11T09:07:25Z
format Article
id doaj.art-e2d3e1bfb8d94ecc8808b982abd673f6
institution Directory Open Access Journal
issn 2227-9059
language English
last_indexed 2024-03-11T09:07:25Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Biomedicines
spelling doaj.art-e2d3e1bfb8d94ecc8808b982abd673f62023-11-16T19:20:22ZengMDPI AGBiomedicines2227-90592023-02-0111258110.3390/biomedicines11020581Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver DiseaseAbdul Quadir Md0Sanika Kulkarni1Christy Jackson Joshua2Tejas Vaichole3Senthilkumar Mohan4Celestine Iwendi5School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, IndiaSchool of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, IndiaSchool of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, IndiaSchool of Creative Technologies, University of Bolton, Bolton BL3 5AB, UKThere has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment earlier, thereby saving their lives. It has become increasingly popular to use ensemble learning algorithms since they perform better than traditional machine learning algorithms. In this context, this paper proposes a novel architecture based on ensemble learning and enhanced preprocessing to predict liver disease using the Indian Liver Patient Dataset (ILPD). Six ensemble learning algorithms are applied to the ILPD, and their results are compared to those obtained with existing studies. The proposed model uses several data preprocessing methods, such as data balancing, feature scaling, and feature selection, to improve the accuracy with appropriate imputations. Multivariate imputation is applied to fill in missing values. On skewed columns, log1p transformation was applied, along with standardization, min–max scaling, maximum absolute scaling, and robust scaling techniques. The selection of features is carried out based on several methods including univariate selection, feature importance, and correlation matrix. These enhanced preprocessed data are trained on Gradient boosting, XGBoost, Bagging, Random Forest, Extra Tree, and Stacking ensemble learning algorithms. The results of the six models were compared with each other, as well as with the models used in other research works. The proposed model using extra tree classifier and random forest, outperformed the other methods with the highest testing accuracy of 91.82% and 86.06%, respectively, portraying our method as a real-world solution for detecting liver disease.https://www.mdpi.com/2227-9059/11/2/581liver diseasemachine learningmultivariate imputationfeature scalingensemble learninggradient boosting
spellingShingle Abdul Quadir Md
Sanika Kulkarni
Christy Jackson Joshua
Tejas Vaichole
Senthilkumar Mohan
Celestine Iwendi
Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
Biomedicines
liver disease
machine learning
multivariate imputation
feature scaling
ensemble learning
gradient boosting
title Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
title_full Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
title_fullStr Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
title_full_unstemmed Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
title_short Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease
title_sort enhanced preprocessing approach using ensemble machine learning algorithms for detecting liver disease
topic liver disease
machine learning
multivariate imputation
feature scaling
ensemble learning
gradient boosting
url https://www.mdpi.com/2227-9059/11/2/581
work_keys_str_mv AT abdulquadirmd enhancedpreprocessingapproachusingensemblemachinelearningalgorithmsfordetectingliverdisease
AT sanikakulkarni enhancedpreprocessingapproachusingensemblemachinelearningalgorithmsfordetectingliverdisease
AT christyjacksonjoshua enhancedpreprocessingapproachusingensemblemachinelearningalgorithmsfordetectingliverdisease
AT tejasvaichole enhancedpreprocessingapproachusingensemblemachinelearningalgorithmsfordetectingliverdisease
AT senthilkumarmohan enhancedpreprocessingapproachusingensemblemachinelearningalgorithmsfordetectingliverdisease
AT celestineiwendi enhancedpreprocessingapproachusingensemblemachinelearningalgorithmsfordetectingliverdisease