A Microarray Data Pre-processing Method for Cancer Classification

The development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns o...

Full description

Bibliographic Details
Main Authors: Tay Xin Hui, Shahreen Kasim, Mohd Farhan Md Fudzee, Zubaile Abdullah, Rohayanti Hassan, Aldo Erianda
Format: Article
Language:English
Published: Politeknik Negeri Padang 2022-12-01
Series:JOIV: International Journal on Informatics Visualization
Subjects:
Online Access:https://joiv.org/index.php/joiv/article/view/1523
_version_ 1827998717805854720
author Tay Xin Hui
Shahreen Kasim
Mohd Farhan Md Fudzee
Zubaile Abdullah
Rohayanti Hassan
Aldo Erianda
author_facet Tay Xin Hui
Shahreen Kasim
Mohd Farhan Md Fudzee
Zubaile Abdullah
Rohayanti Hassan
Aldo Erianda
author_sort Tay Xin Hui
collection DOAJ
description The development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns of interest. The exploitation of microarray data is of great challenge for many researchers. Raw gene expression data are usually vulnerable to missing values, noisy data, incomplete data, and inconsistent data. Hence, processing data before being applied for cancer classification is important. In order to extract the biological significance of microarray gene expression data, data pre-processing is a necessary step to obtain valuable information for further analysis and address important hypotheses. This study presents a detailed description of pre-processing data method for cancer classification. The proposed method consists of three phases: data cleaning, transformation, and filtering. The combination of GenePattern software tool and Rstudio was utilized to implement the proposed data pre-processing method. The proposed method was applied to six gene expression datasets: lung cancer dataset, stomach cancer dataset, liver cancer dataset, kidney cancer dataset, thyroid cancer dataset, and breast cancer dataset to demonstrate the feasibility of the proposed method for cancer classification. A comparison has been made to illustrate the differences between the dataset before and after data pre-processing.
first_indexed 2024-04-10T05:47:07Z
format Article
id doaj.art-0c929d71c46a4fca98c50085d028bc1d
institution Directory Open Access Journal
issn 2549-9610
2549-9904
language English
last_indexed 2024-04-10T05:47:07Z
publishDate 2022-12-01
publisher Politeknik Negeri Padang
record_format Article
series JOIV: International Journal on Informatics Visualization
spelling doaj.art-0c929d71c46a4fca98c50085d028bc1d2023-03-05T10:28:41ZengPoliteknik Negeri PadangJOIV: International Journal on Informatics Visualization2549-96102549-99042022-12-016478479010.30630/joiv.6.4.1523444A Microarray Data Pre-processing Method for Cancer ClassificationTay Xin Hui0Shahreen Kasim1Mohd Farhan Md Fudzee2Zubaile Abdullah3Rohayanti Hassan4Aldo Erianda5Universiti Tun Hussein Onn Malaysia, Parit Raja 86400, Johor, MalaysiaUniversiti Tun Hussein Onn Malaysia, Parit Raja 86400, Johor, MalaysiaUniversiti Tun Hussein Onn Malaysia, Parit Raja 86400, Johor, MalaysiaUniversiti Tun Hussein Onn Malaysia, Parit Raja 86400, Johor, MalaysiaUniversiti Teknologi Malaysia, 83100, Johor, MalaysiaPoliteknik Negeri Padang, Sumatera Barat, IndonesiaThe development of microarray technology has led to significant improvements and research in various fields. With the help of machine learning techniques and statistical methods, it is now possible to organize, analyze, and interpret large amounts of biological data to uncover significant patterns of interest. The exploitation of microarray data is of great challenge for many researchers. Raw gene expression data are usually vulnerable to missing values, noisy data, incomplete data, and inconsistent data. Hence, processing data before being applied for cancer classification is important. In order to extract the biological significance of microarray gene expression data, data pre-processing is a necessary step to obtain valuable information for further analysis and address important hypotheses. This study presents a detailed description of pre-processing data method for cancer classification. The proposed method consists of three phases: data cleaning, transformation, and filtering. The combination of GenePattern software tool and Rstudio was utilized to implement the proposed data pre-processing method. The proposed method was applied to six gene expression datasets: lung cancer dataset, stomach cancer dataset, liver cancer dataset, kidney cancer dataset, thyroid cancer dataset, and breast cancer dataset to demonstrate the feasibility of the proposed method for cancer classification. A comparison has been made to illustrate the differences between the dataset before and after data pre-processing.https://joiv.org/index.php/joiv/article/view/1523data pre-processingmicroarray datagene expression datagenepattern.
spellingShingle Tay Xin Hui
Shahreen Kasim
Mohd Farhan Md Fudzee
Zubaile Abdullah
Rohayanti Hassan
Aldo Erianda
A Microarray Data Pre-processing Method for Cancer Classification
JOIV: International Journal on Informatics Visualization
data pre-processing
microarray data
gene expression data
genepattern.
title A Microarray Data Pre-processing Method for Cancer Classification
title_full A Microarray Data Pre-processing Method for Cancer Classification
title_fullStr A Microarray Data Pre-processing Method for Cancer Classification
title_full_unstemmed A Microarray Data Pre-processing Method for Cancer Classification
title_short A Microarray Data Pre-processing Method for Cancer Classification
title_sort microarray data pre processing method for cancer classification
topic data pre-processing
microarray data
gene expression data
genepattern.
url https://joiv.org/index.php/joiv/article/view/1523
work_keys_str_mv AT tayxinhui amicroarraydatapreprocessingmethodforcancerclassification
AT shahreenkasim amicroarraydatapreprocessingmethodforcancerclassification
AT mohdfarhanmdfudzee amicroarraydatapreprocessingmethodforcancerclassification
AT zubaileabdullah amicroarraydatapreprocessingmethodforcancerclassification
AT rohayantihassan amicroarraydatapreprocessingmethodforcancerclassification
AT aldoerianda amicroarraydatapreprocessingmethodforcancerclassification
AT tayxinhui microarraydatapreprocessingmethodforcancerclassification
AT shahreenkasim microarraydatapreprocessingmethodforcancerclassification
AT mohdfarhanmdfudzee microarraydatapreprocessingmethodforcancerclassification
AT zubaileabdullah microarraydatapreprocessingmethodforcancerclassification
AT rohayantihassan microarraydatapreprocessingmethodforcancerclassification
AT aldoerianda microarraydatapreprocessingmethodforcancerclassification