Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer

Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and class...

Full description

Bibliographic Details
Main Authors: Erkan Bostanci, Engin Kocak, Metehan Unal, Mehmet Serdar Guzel, Koray Acici, Tunc Asuroglu
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/6/3080
_version_ 1797609034281385984
author Erkan Bostanci
Engin Kocak
Metehan Unal
Mehmet Serdar Guzel
Koray Acici
Tunc Asuroglu
author_facet Erkan Bostanci
Engin Kocak
Metehan Unal
Mehmet Serdar Guzel
Koray Acici
Tunc Asuroglu
author_sort Erkan Bostanci
collection DOAJ
description Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features.
first_indexed 2024-03-11T05:55:56Z
format Article
id doaj.art-0ba0696785e94b559660448b969b60c8
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-11T05:55:56Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-0ba0696785e94b559660448b969b60c82023-11-17T13:45:40ZengMDPI AGSensors1424-82202023-03-01236308010.3390/s23063080Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon CancerErkan Bostanci0Engin Kocak1Metehan Unal2Mehmet Serdar Guzel3Koray Acici4Tunc Asuroglu5Department of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, TurkeyDepartment of Analytical Chemistry, Faculty of Gülhane Pharmacy, University of Health Sciences, 06018 Ankara, TurkeyDepartment of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, TurkeyDepartment of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, TurkeyDepartment of Artificial Intelligence and Data Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, TurkeyFaculty of Medicine and Health Technology, Tampere University, 33720 Tampere, FinlandData from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features.https://www.mdpi.com/1424-8220/23/6/3080transcriptomicsRNA-seqmachine learningdeep learningclassificationcancer prediction
spellingShingle Erkan Bostanci
Engin Kocak
Metehan Unal
Mehmet Serdar Guzel
Koray Acici
Tunc Asuroglu
Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
Sensors
transcriptomics
RNA-seq
machine learning
deep learning
classification
cancer prediction
title Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
title_full Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
title_fullStr Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
title_full_unstemmed Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
title_short Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer
title_sort machine learning analysis of rna seq data for diagnostic and prognostic prediction of colon cancer
topic transcriptomics
RNA-seq
machine learning
deep learning
classification
cancer prediction
url https://www.mdpi.com/1424-8220/23/6/3080
work_keys_str_mv AT erkanbostanci machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer
AT enginkocak machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer
AT metehanunal machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer
AT mehmetserdarguzel machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer
AT korayacici machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer
AT tuncasuroglu machinelearninganalysisofrnaseqdatafordiagnosticandprognosticpredictionofcoloncancer