A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
Cancer is the deadliest disease in humankind. Ovarian Cancer (OC) is important among female-specific cancers. Epithelial Ovarian Cancer (EOC) is the most commonly occurring subtype of OC. The disease is identified in later stages due to the unrevealed symptoms in the early stages. Gene Expression ex...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10304147/ |
_version_ | 1827767736735891456 |
---|---|
author | Asha Abraham Habeeb Shaik Mohideen R. Kayalvizhi |
author_facet | Asha Abraham Habeeb Shaik Mohideen R. Kayalvizhi |
author_sort | Asha Abraham |
collection | DOAJ |
description | Cancer is the deadliest disease in humankind. Ovarian Cancer (OC) is important among female-specific cancers. Epithelial Ovarian Cancer (EOC) is the most commonly occurring subtype of OC. The disease is identified in later stages due to the unrevealed symptoms in the early stages. Gene Expression experiments and machine learning (ML) methodologies can lead to preventive care of OC. This can be achieved by identifying malignant gene transformations earlier and using precision medicine that aids in fast recovery. The proposed hybrid Tabular Variational Auto Encoder oriented dictionary based Stratified K Fold Cross Validation (TVAE_dict_SKCV) is an effective model to handle the threat. The main objective is to assess the significance of EOC screening variables for categorizing high-risk patients. It initially generated synthetic data using the TVAE model to increase the EOC subtype data size from the Cancer Cell Line Encyclopedia. The synthesized data were balanced utilizing the Synthetic Minority Oversampling Technique. Significant features were selected with the Boruta Feature Selection method. The HYPERPARAMETERS were fine-tuned employing Optuna optimizer and applied enhanced SKCV with Random Forest classifier. The TVAE_dict_SKCV method with Boruta acquired an accuracy of 98.5 % and outperformed the experiment with Lasso Feature Selection and with original data. Shapley Additive explanations summarize the main features which classify. Optuna efficiently reduced the computing time compared to the Grid Search Cross Validation optimizer. |
first_indexed | 2024-03-11T12:03:07Z |
format | Article |
id | doaj.art-e61e6273948c44ce8c4f19df8b51a8cf |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-11T12:03:07Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-e61e6273948c44ce8c4f19df8b51a8cf2023-11-08T00:00:43ZengIEEEIEEE Access2169-35362023-01-011112276012277110.1109/ACCESS.2023.332913910304147A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature SelectionAsha Abraham0Habeeb Shaik Mohideen1R. Kayalvizhi2https://orcid.org/0000-0001-6803-8951Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, IndiaDepartment of Genetic Engineering College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chennai, IndiaDepartment of Networking and Communications, School of Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai, IndiaCancer is the deadliest disease in humankind. Ovarian Cancer (OC) is important among female-specific cancers. Epithelial Ovarian Cancer (EOC) is the most commonly occurring subtype of OC. The disease is identified in later stages due to the unrevealed symptoms in the early stages. Gene Expression experiments and machine learning (ML) methodologies can lead to preventive care of OC. This can be achieved by identifying malignant gene transformations earlier and using precision medicine that aids in fast recovery. The proposed hybrid Tabular Variational Auto Encoder oriented dictionary based Stratified K Fold Cross Validation (TVAE_dict_SKCV) is an effective model to handle the threat. The main objective is to assess the significance of EOC screening variables for categorizing high-risk patients. It initially generated synthetic data using the TVAE model to increase the EOC subtype data size from the Cancer Cell Line Encyclopedia. The synthesized data were balanced utilizing the Synthetic Minority Oversampling Technique. Significant features were selected with the Boruta Feature Selection method. The HYPERPARAMETERS were fine-tuned employing Optuna optimizer and applied enhanced SKCV with Random Forest classifier. The TVAE_dict_SKCV method with Boruta acquired an accuracy of 98.5 % and outperformed the experiment with Lasso Feature Selection and with original data. Shapley Additive explanations summarize the main features which classify. Optuna efficiently reduced the computing time compared to the Grid Search Cross Validation optimizer.https://ieeexplore.ieee.org/document/10304147/Machine learningovarian cancerpickleOptunaTVAEBoruta |
spellingShingle | Asha Abraham Habeeb Shaik Mohideen R. Kayalvizhi A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection IEEE Access Machine learning ovarian cancer pickle Optuna TVAE Boruta |
title | A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection |
title_full | A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection |
title_fullStr | A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection |
title_full_unstemmed | A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection |
title_short | A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection |
title_sort | tabular variational auto encoder based hybrid model for imbalanced data classification with feature selection |
topic | Machine learning ovarian cancer pickle Optuna TVAE Boruta |
url | https://ieeexplore.ieee.org/document/10304147/ |
work_keys_str_mv | AT ashaabraham atabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection AT habeebshaikmohideen atabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection AT rkayalvizhi atabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection AT ashaabraham tabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection AT habeebshaikmohideen tabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection AT rkayalvizhi tabularvariationalautoencoderbasedhybridmodelforimbalanceddataclassificationwithfeatureselection |