Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso

The research conducted undersampling and gene selection as a starting point for cancer classification in gene expression datasets with a high-dimensional and imbalanced class. It investigated whether implementing undersampling before gene selection gave better results than without implementing under...

Full description

Bibliographic Details
Main Authors: Masithoh Yessi Rochayani, Umu Sa'adah, Ani Budi Astuti
Format: Article
Language:English
Published: Bina Nusantara University 2020-12-01
Series:ComTech
Subjects:
Online Access:https://journal.binus.ac.id/index.php/comtech/article/view/6452
_version_ 1797717522830589952
author Masithoh Yessi Rochayani
Umu Sa'adah
Ani Budi Astuti
author_facet Masithoh Yessi Rochayani
Umu Sa'adah
Ani Budi Astuti
author_sort Masithoh Yessi Rochayani
collection DOAJ
description The research conducted undersampling and gene selection as a starting point for cancer classification in gene expression datasets with a high-dimensional and imbalanced class. It investigated whether implementing undersampling before gene selection gave better results than without implementing undersampling. The used undersampling method was Random Undersampling (RUS), and for gene selection, it was Lasso. Then, the selected genes based on theory were validated. To explore the effectiveness of applying RUS before gene selection, the researchers used two gene expression datasets. Both of the datasets consisted of two classes, 1.545 observations and 10.935 genes, but had a different imbalance ratio. The results show that the proposed gene selection methods, namely Lasso and RUS + Lasso, can produce several important biomarkers, and the obtained model has high accuracy. However, the model is complicated since it involves too many genes. It also finds that undersampling is not affected when it is implemented in a less imbalanced class. Meanwhile, when the dataset is highly imbalanced, undersampling can remove a lot of information from the majority class. Nevertheless, the effectiveness of undersampling remains unclear. Simulation studies can be carried out in the next research to investigate when undersampling should be implemented.
first_indexed 2024-03-12T08:37:36Z
format Article
id doaj.art-458ff97101bb4418a5895dd03de8dc94
institution Directory Open Access Journal
issn 2087-1244
2476-907X
language English
last_indexed 2024-03-12T08:37:36Z
publishDate 2020-12-01
publisher Bina Nusantara University
record_format Article
series ComTech
spelling doaj.art-458ff97101bb4418a5895dd03de8dc942023-09-02T17:07:36ZengBina Nusantara UniversityComTech2087-12442476-907X2020-12-01112758110.21512/comtech.v11i2.64525570Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and LassoMasithoh Yessi Rochayani0Umu Sa'adah1Ani Budi Astuti2Universitas BrawijayaUniversitas BrawijayaUniversitas BrawijayaThe research conducted undersampling and gene selection as a starting point for cancer classification in gene expression datasets with a high-dimensional and imbalanced class. It investigated whether implementing undersampling before gene selection gave better results than without implementing undersampling. The used undersampling method was Random Undersampling (RUS), and for gene selection, it was Lasso. Then, the selected genes based on theory were validated. To explore the effectiveness of applying RUS before gene selection, the researchers used two gene expression datasets. Both of the datasets consisted of two classes, 1.545 observations and 10.935 genes, but had a different imbalance ratio. The results show that the proposed gene selection methods, namely Lasso and RUS + Lasso, can produce several important biomarkers, and the obtained model has high accuracy. However, the model is complicated since it involves too many genes. It also finds that undersampling is not affected when it is implemented in a less imbalanced class. Meanwhile, when the dataset is highly imbalanced, undersampling can remove a lot of information from the majority class. Nevertheless, the effectiveness of undersampling remains unclear. Simulation studies can be carried out in the next research to investigate when undersampling should be implemented.https://journal.binus.ac.id/index.php/comtech/article/view/6452biomarkershigh-dimensional imbalanced datasetrandom undersampling (rus)lasso hybrid method
spellingShingle Masithoh Yessi Rochayani
Umu Sa'adah
Ani Budi Astuti
Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso
ComTech
biomarkers
high-dimensional imbalanced dataset
random undersampling (rus)
lasso hybrid method
title Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso
title_full Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso
title_fullStr Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso
title_full_unstemmed Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso
title_short Finding Biomarkers from a High-Dimensional Imbalanced Dataset Using the Hybrid Method of Random Undersampling and Lasso
title_sort finding biomarkers from a high dimensional imbalanced dataset using the hybrid method of random undersampling and lasso
topic biomarkers
high-dimensional imbalanced dataset
random undersampling (rus)
lasso hybrid method
url https://journal.binus.ac.id/index.php/comtech/article/view/6452
work_keys_str_mv AT masithohyessirochayani findingbiomarkersfromahighdimensionalimbalanceddatasetusingthehybridmethodofrandomundersamplingandlasso
AT umusaadah findingbiomarkersfromahighdimensionalimbalanceddatasetusingthehybridmethodofrandomundersamplingandlasso
AT anibudiastuti findingbiomarkersfromahighdimensionalimbalanceddatasetusingthehybridmethodofrandomundersamplingandlasso