Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions

Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which...

Full description

Bibliographic Details
Main Author: Li-Pang Chen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/?tool=EBI
_version_ 1811267331783917568
author Li-Pang Chen
author_facet Li-Pang Chen
author_sort Li-Pang Chen
collection DOAJ
description Analysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods.
first_indexed 2024-04-12T20:59:25Z
format Article
id doaj.art-b5514281133846e0a9a25715504658b3
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-12T20:59:25Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-b5514281133846e0a9a25715504658b32022-12-22T03:16:52ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-01179Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressionsLi-Pang ChenAnalysis of gene expression data is an attractive topic in the field of bioinformatics, and a typical application is to classify and predict individuals’ diseases or tumors by treating gene expression values as predictors. A primary challenge of this study comes from ultrahigh-dimensionality, which makes that (i) many predictors in the dataset might be non-informative, (ii) pairwise dependence structures possibly exist among high-dimensional predictors, yielding the network structure. While many supervised learning methods have been developed, it is expected that the prediction performance would be affected if impacts of ultrahigh-dimensionality were not carefully addressed. In this paper, we propose a new statistical learning algorithm to deal with multi-classification subject to ultrahigh-dimensional gene expressions. In the proposed algorithm, we employ the model-free feature screening method to retain informative gene expression values from ultrahigh-dimensional data, and then construct predictive models with network structures of selected gene expression accommodated. Different from existing supervised learning methods that build predictive models based on entire dataset, our approach is able to identify informative predictors and dependence structures for gene expression. Throughout analysis of a real dataset, we find that the proposed algorithm gives precise classification as well as accurate prediction, and outperforms some commonly used supervised learning methods.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/?tool=EBI
spellingShingle Li-Pang Chen
Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
PLoS ONE
title Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_full Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_fullStr Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_full_unstemmed Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_short Classification and prediction for multi-cancer data with ultrahigh-dimensional gene expressions
title_sort classification and prediction for multi cancer data with ultrahigh dimensional gene expressions
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9477337/?tool=EBI
work_keys_str_mv AT lipangchen classificationandpredictionformulticancerdatawithultrahighdimensionalgeneexpressions