A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection

Early disease detection using microarray data is vital for prompt and efficient treatment. However, the intricate nature of these data and the ongoing need for more precise interpretation techniques make it a persistently active research field. Numerous gene expression datasets are publicly availabl...

Full description

Bibliographic Details
Main Authors: Adara Nogueira, Artur Ferreira, Mário Figueiredo
Format: Article
Language:English
Published: MDPI AG 2023-08-01
Series:BioMedInformatics
Subjects:
Online Access:https://www.mdpi.com/2673-7426/3/3/40
_version_ 1827727014142935040
author Adara Nogueira
Artur Ferreira
Mário Figueiredo
author_facet Adara Nogueira
Artur Ferreira
Mário Figueiredo
author_sort Adara Nogueira
collection DOAJ
description Early disease detection using microarray data is vital for prompt and efficient treatment. However, the intricate nature of these data and the ongoing need for more precise interpretation techniques make it a persistently active research field. Numerous gene expression datasets are publicly available, containing microarray data that reflect the activation status of thousands of genes in patients who may have a specific disease. These datasets encompass a vast number of genes, resulting in high-dimensional feature vectors that present significant challenges for human analysis. Consequently, pinpointing the genes frequently associated with a particular disease becomes a crucial task. In this paper, we present a method capable of determining the frequency with which a gene (feature) is selected for the classification of a specific disease, by incorporating feature discretization and selection techniques into a machine learning pipeline. The experimental results demonstrate high accuracy and a low false negative rate, while significantly reducing the data’s dimensionality in the process. The resulting subsets of genes are manageable for clinical experts, enabling them to verify the presence of a given disease.
first_indexed 2024-03-10T23:00:27Z
format Article
id doaj.art-87d60f198a8b481c9f753eebfe9c3197
institution Directory Open Access Journal
issn 2673-7426
language English
last_indexed 2024-03-10T23:00:27Z
publishDate 2023-08-01
publisher MDPI AG
record_format Article
series BioMedInformatics
spelling doaj.art-87d60f198a8b481c9f753eebfe9c31972023-11-19T09:43:34ZengMDPI AGBioMedInformatics2673-74262023-08-013358560410.3390/biomedinformatics3030040A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature SelectionAdara Nogueira0Artur Ferreira1Mário Figueiredo2ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisboa, PortugalISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisboa, PortugalIST—Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, PortugalEarly disease detection using microarray data is vital for prompt and efficient treatment. However, the intricate nature of these data and the ongoing need for more precise interpretation techniques make it a persistently active research field. Numerous gene expression datasets are publicly available, containing microarray data that reflect the activation status of thousands of genes in patients who may have a specific disease. These datasets encompass a vast number of genes, resulting in high-dimensional feature vectors that present significant challenges for human analysis. Consequently, pinpointing the genes frequently associated with a particular disease becomes a crucial task. In this paper, we present a method capable of determining the frequency with which a gene (feature) is selected for the classification of a specific disease, by incorporating feature discretization and selection techniques into a machine learning pipeline. The experimental results demonstrate high accuracy and a low false negative rate, while significantly reducing the data’s dimensionality in the process. The resulting subsets of genes are manageable for clinical experts, enabling them to verify the presence of a given disease.https://www.mdpi.com/2673-7426/3/3/40cancer detectionclassificationfeature discretizationfeature selectiongene expression datamachine learning
spellingShingle Adara Nogueira
Artur Ferreira
Mário Figueiredo
A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
BioMedInformatics
cancer detection
classification
feature discretization
feature selection
gene expression data
machine learning
title A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
title_full A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
title_fullStr A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
title_full_unstemmed A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
title_short A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
title_sort machine learning pipeline for cancer detection on microarray data the role of feature discretization and feature selection
topic cancer detection
classification
feature discretization
feature selection
gene expression data
machine learning
url https://www.mdpi.com/2673-7426/3/3/40
work_keys_str_mv AT adaranogueira amachinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection
AT arturferreira amachinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection
AT mariofigueiredo amachinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection
AT adaranogueira machinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection
AT arturferreira machinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection
AT mariofigueiredo machinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection