A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection
Early disease detection using microarray data is vital for prompt and efficient treatment. However, the intricate nature of these data and the ongoing need for more precise interpretation techniques make it a persistently active research field. Numerous gene expression datasets are publicly availabl...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-08-01
|
Series: | BioMedInformatics |
Subjects: | |
Online Access: | https://www.mdpi.com/2673-7426/3/3/40 |
_version_ | 1827727014142935040 |
---|---|
author | Adara Nogueira Artur Ferreira Mário Figueiredo |
author_facet | Adara Nogueira Artur Ferreira Mário Figueiredo |
author_sort | Adara Nogueira |
collection | DOAJ |
description | Early disease detection using microarray data is vital for prompt and efficient treatment. However, the intricate nature of these data and the ongoing need for more precise interpretation techniques make it a persistently active research field. Numerous gene expression datasets are publicly available, containing microarray data that reflect the activation status of thousands of genes in patients who may have a specific disease. These datasets encompass a vast number of genes, resulting in high-dimensional feature vectors that present significant challenges for human analysis. Consequently, pinpointing the genes frequently associated with a particular disease becomes a crucial task. In this paper, we present a method capable of determining the frequency with which a gene (feature) is selected for the classification of a specific disease, by incorporating feature discretization and selection techniques into a machine learning pipeline. The experimental results demonstrate high accuracy and a low false negative rate, while significantly reducing the data’s dimensionality in the process. The resulting subsets of genes are manageable for clinical experts, enabling them to verify the presence of a given disease. |
first_indexed | 2024-03-10T23:00:27Z |
format | Article |
id | doaj.art-87d60f198a8b481c9f753eebfe9c3197 |
institution | Directory Open Access Journal |
issn | 2673-7426 |
language | English |
last_indexed | 2024-03-10T23:00:27Z |
publishDate | 2023-08-01 |
publisher | MDPI AG |
record_format | Article |
series | BioMedInformatics |
spelling | doaj.art-87d60f198a8b481c9f753eebfe9c31972023-11-19T09:43:34ZengMDPI AGBioMedInformatics2673-74262023-08-013358560410.3390/biomedinformatics3030040A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature SelectionAdara Nogueira0Artur Ferreira1Mário Figueiredo2ISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisboa, PortugalISEL—Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, 1959-007 Lisboa, PortugalIST—Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, PortugalEarly disease detection using microarray data is vital for prompt and efficient treatment. However, the intricate nature of these data and the ongoing need for more precise interpretation techniques make it a persistently active research field. Numerous gene expression datasets are publicly available, containing microarray data that reflect the activation status of thousands of genes in patients who may have a specific disease. These datasets encompass a vast number of genes, resulting in high-dimensional feature vectors that present significant challenges for human analysis. Consequently, pinpointing the genes frequently associated with a particular disease becomes a crucial task. In this paper, we present a method capable of determining the frequency with which a gene (feature) is selected for the classification of a specific disease, by incorporating feature discretization and selection techniques into a machine learning pipeline. The experimental results demonstrate high accuracy and a low false negative rate, while significantly reducing the data’s dimensionality in the process. The resulting subsets of genes are manageable for clinical experts, enabling them to verify the presence of a given disease.https://www.mdpi.com/2673-7426/3/3/40cancer detectionclassificationfeature discretizationfeature selectiongene expression datamachine learning |
spellingShingle | Adara Nogueira Artur Ferreira Mário Figueiredo A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection BioMedInformatics cancer detection classification feature discretization feature selection gene expression data machine learning |
title | A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection |
title_full | A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection |
title_fullStr | A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection |
title_full_unstemmed | A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection |
title_short | A Machine Learning Pipeline for Cancer Detection on Microarray Data: The Role of Feature Discretization and Feature Selection |
title_sort | machine learning pipeline for cancer detection on microarray data the role of feature discretization and feature selection |
topic | cancer detection classification feature discretization feature selection gene expression data machine learning |
url | https://www.mdpi.com/2673-7426/3/3/40 |
work_keys_str_mv | AT adaranogueira amachinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection AT arturferreira amachinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection AT mariofigueiredo amachinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection AT adaranogueira machinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection AT arturferreira machinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection AT mariofigueiredo machinelearningpipelineforcancerdetectiononmicroarraydatatheroleoffeaturediscretizationandfeatureselection |