Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data

Abstract Background Clinically, behavior, cognitive, and mental functions are affected during the neurodegenerative disease progression. To date, the molecular pathogenesis of these complex disease is still unclear. With the rapid development of sequencing technologies, it is possible to delicately...

Full description

Bibliographic Details
Main Authors: Xue Jiang, Miao Chen, Weichen Song, Guan Ning Lin
Format: Article
Language:English
Published: BMC 2021-08-01
Series:BMC Medical Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12920-021-00985-0
_version_ 1819115226368311296
author Xue Jiang
Miao Chen
Weichen Song
Guan Ning Lin
author_facet Xue Jiang
Miao Chen
Weichen Song
Guan Ning Lin
author_sort Xue Jiang
collection DOAJ
description Abstract Background Clinically, behavior, cognitive, and mental functions are affected during the neurodegenerative disease progression. To date, the molecular pathogenesis of these complex disease is still unclear. With the rapid development of sequencing technologies, it is possible to delicately decode the molecular mechanisms corresponding to different clinical phenotypes at the genome-wide transcriptomic level using computational methods. Our previous studies have shown that it is difficult to distinguish disease genes from non-disease genes. Therefore, to precisely explore the molecular pathogenesis under complex clinical phenotypes, it is better to identify biomarkers corresponding to different disease stages or clinical phenotypes. So, in this study, we designed a label propagation-based semi-supervised feature selection approach (LPFS) to prioritize disease-associated genes corresponding to different disease stages or clinical phenotypes. Methods In this study, we pioneering put label propagation clustering and feature selection into one framework and proposed label propagation-based semi-supervised feature selection approach. LPFS prioritizes disease genes related to different disease stages or phenotypes through the alternative iteration of label propagation clustering based on sample network and feature selection with gene expression profiles. Then the GO and KEGG pathway enrichment analysis were carried as well as the gene functional analysis to explore molecular mechanisms of specific disease phenotypes, thus to decode the changes in individual behavioral and mental characteristics during neurodegenerative disease progression. Results Large amounts of experiments were conducted to verify the performance of LPFS with Huntington’s gene expression data. Experimental results shown that LPFS performs better in comparison with the-state-of-art methods. GO and KEGG enrichment analysis of key gene sets shown that TGF-beta signaling pathway, cytokine-cytokine receptor interaction, immune response, and inflammatory response were gradually affected during the Huntington’s disease progression. In addition, we found that the expression of SLC4A11, ZFP474, AMBP, TOP2A, PBK, CCDC33, APSL, DLGAP5, and Al662270 changed seriously by the development of the disease. Conclusions In this study, we designed a label propagation-based semi-supervised feature selection model to precisely selected key genes of different disease phenotypes. We conducted experiments using the model with Huntington’s disease mice gene expression data to decode the mechanisms of it. We found many cell types, including astrocyte, microglia, and GABAergic neuron, could be involved in the pathological process.
first_indexed 2024-12-22T04:57:49Z
format Article
id doaj.art-d5dcf21af2f34ef1a2177982c1470d5b
institution Directory Open Access Journal
issn 1755-8794
language English
last_indexed 2024-12-22T04:57:49Z
publishDate 2021-08-01
publisher BMC
record_format Article
series BMC Medical Genomics
spelling doaj.art-d5dcf21af2f34ef1a2177982c1470d5b2022-12-21T18:38:20ZengBMCBMC Medical Genomics1755-87942021-08-0114S111110.1186/s12920-021-00985-0Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq dataXue Jiang0Miao Chen1Weichen Song2Guan Ning Lin3Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityShanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong UniversityAbstract Background Clinically, behavior, cognitive, and mental functions are affected during the neurodegenerative disease progression. To date, the molecular pathogenesis of these complex disease is still unclear. With the rapid development of sequencing technologies, it is possible to delicately decode the molecular mechanisms corresponding to different clinical phenotypes at the genome-wide transcriptomic level using computational methods. Our previous studies have shown that it is difficult to distinguish disease genes from non-disease genes. Therefore, to precisely explore the molecular pathogenesis under complex clinical phenotypes, it is better to identify biomarkers corresponding to different disease stages or clinical phenotypes. So, in this study, we designed a label propagation-based semi-supervised feature selection approach (LPFS) to prioritize disease-associated genes corresponding to different disease stages or clinical phenotypes. Methods In this study, we pioneering put label propagation clustering and feature selection into one framework and proposed label propagation-based semi-supervised feature selection approach. LPFS prioritizes disease genes related to different disease stages or phenotypes through the alternative iteration of label propagation clustering based on sample network and feature selection with gene expression profiles. Then the GO and KEGG pathway enrichment analysis were carried as well as the gene functional analysis to explore molecular mechanisms of specific disease phenotypes, thus to decode the changes in individual behavioral and mental characteristics during neurodegenerative disease progression. Results Large amounts of experiments were conducted to verify the performance of LPFS with Huntington’s gene expression data. Experimental results shown that LPFS performs better in comparison with the-state-of-art methods. GO and KEGG enrichment analysis of key gene sets shown that TGF-beta signaling pathway, cytokine-cytokine receptor interaction, immune response, and inflammatory response were gradually affected during the Huntington’s disease progression. In addition, we found that the expression of SLC4A11, ZFP474, AMBP, TOP2A, PBK, CCDC33, APSL, DLGAP5, and Al662270 changed seriously by the development of the disease. Conclusions In this study, we designed a label propagation-based semi-supervised feature selection model to precisely selected key genes of different disease phenotypes. We conducted experiments using the model with Huntington’s disease mice gene expression data to decode the mechanisms of it. We found many cell types, including astrocyte, microglia, and GABAergic neuron, could be involved in the pathological process.https://doi.org/10.1186/s12920-021-00985-0Biomarkers that corresponding to clinical phenotypesLabel propagation clusteringFeature selection
spellingShingle Xue Jiang
Miao Chen
Weichen Song
Guan Ning Lin
Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
BMC Medical Genomics
Biomarkers that corresponding to clinical phenotypes
Label propagation clustering
Feature selection
title Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
title_full Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
title_fullStr Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
title_full_unstemmed Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
title_short Label propagation-based semi-supervised feature selection on decoding clinical phenotypes with RNA-seq data
title_sort label propagation based semi supervised feature selection on decoding clinical phenotypes with rna seq data
topic Biomarkers that corresponding to clinical phenotypes
Label propagation clustering
Feature selection
url https://doi.org/10.1186/s12920-021-00985-0
work_keys_str_mv AT xuejiang labelpropagationbasedsemisupervisedfeatureselectionondecodingclinicalphenotypeswithrnaseqdata
AT miaochen labelpropagationbasedsemisupervisedfeatureselectionondecodingclinicalphenotypeswithrnaseqdata
AT weichensong labelpropagationbasedsemisupervisedfeatureselectionondecodingclinicalphenotypeswithrnaseqdata
AT guanninglin labelpropagationbasedsemisupervisedfeatureselectionondecodingclinicalphenotypeswithrnaseqdata