High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer

The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal...

Full description

Bibliographic Details
Main Authors: Nguyen Phuoc Long, Seongoh Park, Nguyen Hoang Anh, Tran Diem Nghi, Sang Jun Yoon, Jeong Hill Park, Johan Lim, Sung Won Kwon
Format: Article
Language:English
Published: MDPI AG 2019-01-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:http://www.mdpi.com/1422-0067/20/2/296
_version_ 1811324135263961088
author Nguyen Phuoc Long
Seongoh Park
Nguyen Hoang Anh
Tran Diem Nghi
Sang Jun Yoon
Jeong Hill Park
Johan Lim
Sung Won Kwon
author_facet Nguyen Phuoc Long
Seongoh Park
Nguyen Hoang Anh
Tran Diem Nghi
Sang Jun Yoon
Jeong Hill Park
Johan Lim
Sung Won Kwon
author_sort Nguyen Phuoc Long
collection DOAJ
description The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in Helicobacter pylori infection and inflammatory processes. The overexpression of TGFBI and S100A2 was associated with poor disease-free survival while the down-regulation of NR5A2, SLC4A4, and CD177 was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings.
first_indexed 2024-04-13T14:09:00Z
format Article
id doaj.art-863f845f379a4afe9c135acf513cecdf
institution Directory Open Access Journal
issn 1422-0067
language English
last_indexed 2024-04-13T14:09:00Z
publishDate 2019-01-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-863f845f379a4afe9c135acf513cecdf2022-12-22T02:43:50ZengMDPI AGInternational Journal of Molecular Sciences1422-00672019-01-0120229610.3390/ijms20020296ijms20020296High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal CancerNguyen Phuoc Long0Seongoh Park1Nguyen Hoang Anh2Tran Diem Nghi3Sang Jun Yoon4Jeong Hill Park5Johan Lim6Sung Won Kwon7College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, KoreaDepartment of Statistics, Seoul National University, Seoul 08826, KoreaCollege of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, KoreaSchool of Medicine, Vietnam National University, Ho Chi Minh 70000, VietnamCollege of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, KoreaCollege of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, KoreaDepartment of Statistics, Seoul National University, Seoul 08826, KoreaCollege of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, KoreaThe advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in Helicobacter pylori infection and inflammatory processes. The overexpression of TGFBI and S100A2 was associated with poor disease-free survival while the down-regulation of NR5A2, SLC4A4, and CD177 was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings.http://www.mdpi.com/1422-0067/20/2/296colorectal cancertranscriptomicsdiagnosisbiomarkermachine learningvariable selection
spellingShingle Nguyen Phuoc Long
Seongoh Park
Nguyen Hoang Anh
Tran Diem Nghi
Sang Jun Yoon
Jeong Hill Park
Johan Lim
Sung Won Kwon
High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
International Journal of Molecular Sciences
colorectal cancer
transcriptomics
diagnosis
biomarker
machine learning
variable selection
title High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_full High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_fullStr High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_full_unstemmed High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_short High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer
title_sort high throughput omics and statistical learning integration for the discovery and validation of novel diagnostic signatures in colorectal cancer
topic colorectal cancer
transcriptomics
diagnosis
biomarker
machine learning
variable selection
url http://www.mdpi.com/1422-0067/20/2/296
work_keys_str_mv AT nguyenphuoclong highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT seongohpark highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT nguyenhoanganh highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT trandiemnghi highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT sangjunyoon highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT jeonghillpark highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT johanlim highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer
AT sungwonkwon highthroughputomicsandstatisticallearningintegrationforthediscoveryandvalidationofnoveldiagnosticsignaturesincolorectalcancer