Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods

Abstract Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between g...

Full description

Bibliographic Details
Main Authors: Lin Zhang, Rui Mao, Chung Tai Lau, Wai Chak Chung, Jacky C. P. Chan, Feng Liang, Chenchen Zhao, Xuan Zhang, Zhaoxiang Bian
Format: Article
Language:English
Published: Nature Portfolio 2022-06-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-14048-6
_version_ 1811343664348135424
author Lin Zhang
Rui Mao
Chung Tai Lau
Wai Chak Chung
Jacky C. P. Chan
Feng Liang
Chenchen Zhao
Xuan Zhang
Zhaoxiang Bian
author_facet Lin Zhang
Rui Mao
Chung Tai Lau
Wai Chak Chung
Jacky C. P. Chan
Feng Liang
Chenchen Zhao
Xuan Zhang
Zhaoxiang Bian
author_sort Lin Zhang
collection DOAJ
description Abstract Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P  <  0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted.
first_indexed 2024-04-13T19:33:55Z
format Article
id doaj.art-ec17b68b6ba944c7a68d847f22205867
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-13T19:33:55Z
publishDate 2022-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-ec17b68b6ba944c7a68d847f222058672022-12-22T02:33:07ZengNature PortfolioScientific Reports2045-23222022-06-0112111310.1038/s41598-022-14048-6Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methodsLin Zhang0Rui Mao1Chung Tai Lau2Wai Chak Chung3Jacky C. P. Chan4Feng Liang5Chenchen Zhao6Xuan Zhang7Zhaoxiang Bian8Tianjin University of Traditional Chinese MedicineTianjin University of Traditional Chinese MedicineChinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist UniversityChinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist UniversityDepartment of Computer Science, HKBU Faculty of Science, Hong Kong Baptist UniversityChinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist UniversityOncology Department, The Second Affiliated Hospital of Tianjin University of Traditional Chinese MedicineChinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist UniversityChinese Clinical Trial Registry (Hong Kong), Hong Kong Chinese Medicine Clinical Study Centre, Chinese EQUATOR Centre, School of Chinese Medicine, Hong Kong Baptist UniversityAbstract Ulcerative colitis (UC) is a chronic relapsing inflammatory bowel disease with an increasing incidence and prevalence worldwide. The diagnosis for UC mainly relies on clinical symptoms and laboratory examinations. As some previous studies have revealed that there is an association between gene expression signature and disease severity, we thereby aim to assess whether genes can help to diagnose UC and predict its correlation with immune regulation. A total of ten eligible microarrays (including 387 UC patients and 139 healthy subjects) were included in this study, specifically with six microarrays (GSE48634, GSE6731, GSE114527, GSE13367, GSE36807, and GSE3629) in the training group and four microarrays (GSE53306, GSE87473, GSE74265, and GSE96665) in the testing group. After the data processing, we found 87 differently expressed genes. Furthermore, a total of six machine learning methods, including support vector machine, least absolute shrinkage and selection operator, random forest, gradient boosting machine, principal component analysis, and neural network were adopted to identify potentially useful genes. The synthetic minority oversampling (SMOTE) was used to adjust the imbalanced sample size for two groups (if any). Consequently, six genes were selected for model establishment. According to the receiver operating characteristic, two genes of OLFM4 and C4BPB were finally identified. The average values of area under curve for these two genes are higher than 0.8, either in the original datasets or SMOTE-adjusted datasets. Besides, these two genes also significantly correlated to six immune cells, namely Macrophages M1, Macrophages M2, Mast cells activated, Mast cells resting, Monocytes, and NK cells activated (P  <  0.05). OLFM4 and C4BPB may be conducive to identifying patients with UC. Further verification studies could be conducted.https://doi.org/10.1038/s41598-022-14048-6
spellingShingle Lin Zhang
Rui Mao
Chung Tai Lau
Wai Chak Chung
Jacky C. P. Chan
Feng Liang
Chenchen Zhao
Xuan Zhang
Zhaoxiang Bian
Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
Scientific Reports
title Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_full Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_fullStr Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_full_unstemmed Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_short Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
title_sort identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
url https://doi.org/10.1038/s41598-022-14048-6
work_keys_str_mv AT linzhang identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT ruimao identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT chungtailau identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT waichakchung identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT jackycpchan identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT fengliang identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT chenchenzhao identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT xuanzhang identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods
AT zhaoxiangbian identificationofusefulgenesfrommultiplemicroarraysforulcerativecolitisdiagnosisbasedonmachinelearningmethods