Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data

Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RN...

Full description

Bibliographic Details
Main Authors: Soo-Kyung Park, Sangsoo Kim, Gi-Young Lee, Sung-Yoon Kim, Wan Kim, Chil-Woo Lee, Jong-Lyul Park, Chang-Hwan Choi, Sang-Bum Kang, Tae-Oh Kim, Ki-Bae Bang, Jaeyoung Chun, Jae-Myung Cha, Jong-Pil Im, Kwang-Sung Ahn, Seon-Young Kim, Dong-Il Park
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/11/12/2365
_version_ 1797505443948396544
author Soo-Kyung Park
Sangsoo Kim
Gi-Young Lee
Sung-Yoon Kim
Wan Kim
Chil-Woo Lee
Jong-Lyul Park
Chang-Hwan Choi
Sang-Bum Kang
Tae-Oh Kim
Ki-Bae Bang
Jaeyoung Chun
Jae-Myung Cha
Jong-Pil Im
Kwang-Sung Ahn
Seon-Young Kim
Dong-Il Park
author_facet Soo-Kyung Park
Sangsoo Kim
Gi-Young Lee
Sung-Yoon Kim
Wan Kim
Chil-Woo Lee
Jong-Lyul Park
Chang-Hwan Choi
Sang-Bum Kang
Tae-Oh Kim
Ki-Bae Bang
Jaeyoung Chun
Jae-Myung Cha
Jong-Pil Im
Kwang-Sung Ahn
Seon-Young Kim
Dong-Il Park
author_sort Soo-Kyung Park
collection DOAJ
description Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (<i>n</i> = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.
first_indexed 2024-03-10T04:18:40Z
format Article
id doaj.art-1d5aafe1393a4ee48304dcb5616eb1b1
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-10T04:18:40Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-1d5aafe1393a4ee48304dcb5616eb1b12023-11-23T07:55:03ZengMDPI AGDiagnostics2075-44182021-12-011112236510.3390/diagnostics11122365Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing DataSoo-Kyung Park0Sangsoo Kim1Gi-Young Lee2Sung-Yoon Kim3Wan Kim4Chil-Woo Lee5Jong-Lyul Park6Chang-Hwan Choi7Sang-Bum Kang8Tae-Oh Kim9Ki-Bae Bang10Jaeyoung Chun11Jae-Myung Cha12Jong-Pil Im13Kwang-Sung Ahn14Seon-Young Kim15Dong-Il Park16Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaMedical Research Institute, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, KoreaPersonalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, KoreaDepartment of Internal Medicine, College of Medicine, Chung-Ang University, Seoul 04388, KoreaDepartment of Internal Medicine, College of Medicine, Daejeon St. Mary’s Hospital, The Catholic University of Korea, Daejeon 34943, KoreaDepartment of Internal Medicine, Haeundae Paik Hospital, Inje University College of Medicine, Busan 48108, KoreaDepartment of Internal Medicine, Dankook University College of Medicine, Cheonan 31116, KoreaDepartment of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, KoreaDepartment of Internal Medicine, Kyung Hee University Hospital at Gang Dong, Kyung Hee University College of Medicine, Seoul 05278, KoreaDepartment of Internal Medicine and Liver Research Institute, College of Medicine, Seoul National University, Seoul 03080, KoreaFunctional Genome Institute, PDXen Biosystems Inc., Daejeon 34129, KoreaPersonalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, KoreaDivision of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, KoreaCrohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (<i>n</i> = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.https://www.mdpi.com/2075-4418/11/12/2365inflammatory bowel diseaseCrohn’s diseaseulcerative colitisRNA sequencingmachine learning
spellingShingle Soo-Kyung Park
Sangsoo Kim
Gi-Young Lee
Sung-Yoon Kim
Wan Kim
Chil-Woo Lee
Jong-Lyul Park
Chang-Hwan Choi
Sang-Bum Kang
Tae-Oh Kim
Ki-Bae Bang
Jaeyoung Chun
Jae-Myung Cha
Jong-Pil Im
Kwang-Sung Ahn
Seon-Young Kim
Dong-Il Park
Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
Diagnostics
inflammatory bowel disease
Crohn’s disease
ulcerative colitis
RNA sequencing
machine learning
title Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
title_full Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
title_fullStr Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
title_full_unstemmed Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
title_short Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
title_sort development of a machine learning model to distinguish between ulcerative colitis and crohn s disease using rna sequencing data
topic inflammatory bowel disease
Crohn’s disease
ulcerative colitis
RNA sequencing
machine learning
url https://www.mdpi.com/2075-4418/11/12/2365
work_keys_str_mv AT sookyungpark developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT sangsookim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT giyounglee developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT sungyoonkim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT wankim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT chilwoolee developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT jonglyulpark developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT changhwanchoi developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT sangbumkang developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT taeohkim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT kibaebang developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT jaeyoungchun developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT jaemyungcha developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT jongpilim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT kwangsungahn developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT seonyoungkim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata
AT dongilpark developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata