A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection

Cancer is the deadliest disease in humankind. Ovarian Cancer (OC) is important among female-specific cancers. Epithelial Ovarian Cancer (EOC) is the most commonly occurring subtype of OC. The disease is identified in later stages due to the unrevealed symptoms in the early stages. Gene Expression ex...

Full description

Bibliographic Details
Main Authors: Asha Abraham, Habeeb Shaik Mohideen, R. Kayalvizhi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10304147/
_version_ 1827767736735891456
author Asha Abraham
Habeeb Shaik Mohideen
R. Kayalvizhi
author_facet Asha Abraham
Habeeb Shaik Mohideen
R. Kayalvizhi
author_sort Asha Abraham
collection DOAJ
description Cancer is the deadliest disease in humankind. Ovarian Cancer (OC) is important among female-specific cancers. Epithelial Ovarian Cancer (EOC) is the most commonly occurring subtype of OC. The disease is identified in later stages due to the unrevealed symptoms in the early stages. Gene Expression experiments and machine learning (ML) methodologies can lead to preventive care of OC. This can be achieved by identifying malignant gene transformations earlier and using precision medicine that aids in fast recovery. The proposed hybrid Tabular Variational Auto Encoder oriented dictionary based Stratified K Fold Cross Validation (TVAE_dict_SKCV) is an effective model to handle the threat. The main objective is to assess the significance of EOC screening variables for categorizing high-risk patients. It initially generated synthetic data using the TVAE model to increase the EOC subtype data size from the Cancer Cell Line Encyclopedia. The synthesized data were balanced utilizing the Synthetic Minority Oversampling Technique. Significant features were selected with the Boruta Feature Selection method. The HYPERPARAMETERS were fine-tuned employing Optuna optimizer and applied enhanced SKCV with Random Forest classifier. The TVAE_dict_SKCV method with Boruta acquired an accuracy of 98.5 % and outperformed the experiment with Lasso Feature Selection and with original data. Shapley Additive explanations summarize the main features which classify. Optuna efficiently reduced the computing time compared to the Grid Search Cross Validation optimizer.
first_indexed 2024-03-11T12:03:07Z
format Article
id doaj.art-e61e6273948c44ce8c4f19df8b51a8cf
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-11T12:03:07Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-e61e6273948c44ce8c4f19df8b51a8cf2023-11-08T00:00:43ZengIEEEIEEE Access2169-35362023-01-011112276012277110.1109/ACCESS.2023.332913910304147A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature SelectionAsha Abraham0Habeeb Shaik Mohideen1R. Kayalvizhi2https://orcid.org/0000-0001-6803-8951Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, IndiaDepartment of Genetic Engineering College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, IndiaDepartment of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, IndiaCancer is the deadliest disease in humankind. Ovarian Cancer (OC) is important among female-specific cancers. Epithelial Ovarian Cancer (EOC) is the most commonly occurring subtype of OC. The disease is identified in later stages due to the unrevealed symptoms in the early stages. Gene Expression experiments and machine learning (ML) methodologies can lead to preventive care of OC. This can be achieved by identifying malignant gene transformations earlier and using precision medicine that aids in fast recovery. The proposed hybrid Tabular Variational Auto Encoder oriented dictionary based Stratified K Fold Cross Validation (TVAE_dict_SKCV) is an effective model to handle the threat. The main objective is to assess the significance of EOC screening variables for categorizing high-risk patients. It initially generated synthetic data using the TVAE model to increase the EOC subtype data size from the Cancer Cell Line Encyclopedia. The synthesized data were balanced utilizing the Synthetic Minority Oversampling Technique. Significant features were selected with the Boruta Feature Selection method. The HYPERPARAMETERS were fine-tuned employing Optuna optimizer and applied enhanced SKCV with Random Forest classifier. The TVAE_dict_SKCV method with Boruta acquired an accuracy of 98.5 % and outperformed the experiment with Lasso Feature Selection and with original data. Shapley Additive explanations summarize the main features which classify. Optuna efficiently reduced the computing time compared to the Grid Search Cross Validation optimizer.https://ieeexplore.ieee.org/document/10304147/Machine learningovarian cancerpickleOptunaTVAEBoruta
spellingShingle Asha Abraham
Habeeb Shaik Mohideen
R. Kayalvizhi
A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
IEEE Access
Machine learning
ovarian cancer
pickle
Optuna
TVAE
Boruta
title A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
title_full A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
title_fullStr A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
title_full_unstemmed A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
title_short A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
title_sort tabular variational auto encoder based hybrid model for imbalanced data classification with feature selection
topic Machine learning
ovarian cancer
pickle
Optuna
TVAE
Boruta
url https://ieeexplore.ieee.org/document/10304147/
work_keys_str_mv AT ashaabraham atabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection
AT habeebshaikmohideen atabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection
AT rkayalvizhi atabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection
AT ashaabraham tabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection
AT habeebshaikmohideen tabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection
AT rkayalvizhi tabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection