Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers

Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can...

Full description

Bibliographic Details
Main Authors: Abdullah Al Mamun, Raihanul Bari Tanvir, Masrur Sobhan, Kalai Mathee, Giri Narasimhan, Gregory E. Holt, Ananda Mohan Mondal
Format: Article
Language:English
Published: MDPI AG 2021-11-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/22/21/11919
_version_ 1797512366360887296
author Abdullah Al Mamun
Raihanul Bari Tanvir
Masrur Sobhan
Kalai Mathee
Giri Narasimhan
Gregory E. Holt
Ananda Mohan Mondal
author_facet Abdullah Al Mamun
Raihanul Bari Tanvir
Masrur Sobhan
Kalai Mathee
Giri Narasimhan
Gregory E. Holt
Ananda Mohan Mondal
author_sort Abdullah Al Mamun
collection DOAJ
description Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.
first_indexed 2024-03-10T06:00:46Z
format Article
id doaj.art-7bf70ba08db04196810e51adc7958800
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-10T06:00:46Z
publishDate 2021-11-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-7bf70ba08db04196810e51adc79588002023-11-22T20:59:58ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-11-0122211191910.3390/ijms222111919Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 CancersAbdullah Al Mamun0Raihanul Bari Tanvir1Masrur Sobhan2Kalai Mathee3Giri Narasimhan4Gregory E. Holt5Ananda Mohan Mondal6Knight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USAKnight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USAKnight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USADepartment of Human and Molecular Genetics, Herbert Wertheim College of Medicine, Florida International University, Miami, FL 33199, USAKnight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USADepartment of Medicine, Miami VA Healthcare System, Miami, FL 33125, USAKnight Foundation School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USABackground: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers.https://www.mdpi.com/1422-0067/22/21/11919autoencoderconcrete autoencoderdeep learningfeature selectionlncRNAmrCAE
spellingShingle Abdullah Al Mamun
Raihanul Bari Tanvir
Masrur Sobhan
Kalai Mathee
Giri Narasimhan
Gregory E. Holt
Ananda Mohan Mondal
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
International Journal of Molecular Sciences
autoencoder
concrete autoencoder
deep learning
feature selection
lncRNA
mrCAE
title Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_full Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_fullStr Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_full_unstemmed Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_short Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
title_sort multi run concrete autoencoder to identify prognostic lncrnas for 12 cancers
topic autoencoder
concrete autoencoder
deep learning
feature selection
lncRNA
mrCAE
url https://www.mdpi.com/1422-0067/22/21/11919
work_keys_str_mv AT abdullahalmamun multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT raihanulbaritanvir multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT masrursobhan multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT kalaimathee multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT girinarasimhan multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT gregoryeholt multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers
AT anandamohanmondal multirunconcreteautoencodertoidentifyprognosticlncrnasfor12cancers