Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RN...
Main Authors: | , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-12-01
|
Series: | Diagnostics |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-4418/11/12/2365 |
_version_ | 1797505443948396544 |
---|---|
author | Soo-Kyung Park Sangsoo Kim Gi-Young Lee Sung-Yoon Kim Wan Kim Chil-Woo Lee Jong-Lyul Park Chang-Hwan Choi Sang-Bum Kang Tae-Oh Kim Ki-Bae Bang Jaeyoung Chun Jae-Myung Cha Jong-Pil Im Kwang-Sung Ahn Seon-Young Kim Dong-Il Park |
author_facet | Soo-Kyung Park Sangsoo Kim Gi-Young Lee Sung-Yoon Kim Wan Kim Chil-Woo Lee Jong-Lyul Park Chang-Hwan Choi Sang-Bum Kang Tae-Oh Kim Ki-Bae Bang Jaeyoung Chun Jae-Myung Cha Jong-Pil Im Kwang-Sung Ahn Seon-Young Kim Dong-Il Park |
author_sort | Soo-Kyung Park |
collection | DOAJ |
description | Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (<i>n</i> = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC. |
first_indexed | 2024-03-10T04:18:40Z |
format | Article |
id | doaj.art-1d5aafe1393a4ee48304dcb5616eb1b1 |
institution | Directory Open Access Journal |
issn | 2075-4418 |
language | English |
last_indexed | 2024-03-10T04:18:40Z |
publishDate | 2021-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Diagnostics |
spelling | doaj.art-1d5aafe1393a4ee48304dcb5616eb1b12023-11-23T07:55:03ZengMDPI AGDiagnostics2075-44182021-12-011112236510.3390/diagnostics11122365Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing DataSoo-Kyung Park0Sangsoo Kim1Gi-Young Lee2Sung-Yoon Kim3Wan Kim4Chil-Woo Lee5Jong-Lyul Park6Chang-Hwan Choi7Sang-Bum Kang8Tae-Oh Kim9Ki-Bae Bang10Jaeyoung Chun11Jae-Myung Cha12Jong-Pil Im13Kwang-Sung Ahn14Seon-Young Kim15Dong-Il Park16Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaDepartment of Bioinformatics, Soongsil University, Seoul 06978, KoreaMedical Research Institute, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, KoreaPersonalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, KoreaDepartment of Internal Medicine, College of Medicine, Chung-Ang University, Seoul 04388, KoreaDepartment of Internal Medicine, College of Medicine, Daejeon St. Mary’s Hospital, The Catholic University of Korea, Daejeon 34943, KoreaDepartment of Internal Medicine, Haeundae Paik Hospital, Inje University College of Medicine, Busan 48108, KoreaDepartment of Internal Medicine, Dankook University College of Medicine, Cheonan 31116, KoreaDepartment of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, KoreaDepartment of Internal Medicine, Kyung Hee University Hospital at Gang Dong, Kyung Hee University College of Medicine, Seoul 05278, KoreaDepartment of Internal Medicine and Liver Research Institute, College of Medicine, Seoul National University, Seoul 03080, KoreaFunctional Genome Institute, PDXen Biosystems Inc., Daejeon 34129, KoreaPersonalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, KoreaDivision of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, KoreaCrohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (<i>n</i> = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.https://www.mdpi.com/2075-4418/11/12/2365inflammatory bowel diseaseCrohn’s diseaseulcerative colitisRNA sequencingmachine learning |
spellingShingle | Soo-Kyung Park Sangsoo Kim Gi-Young Lee Sung-Yoon Kim Wan Kim Chil-Woo Lee Jong-Lyul Park Chang-Hwan Choi Sang-Bum Kang Tae-Oh Kim Ki-Bae Bang Jaeyoung Chun Jae-Myung Cha Jong-Pil Im Kwang-Sung Ahn Seon-Young Kim Dong-Il Park Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data Diagnostics inflammatory bowel disease Crohn’s disease ulcerative colitis RNA sequencing machine learning |
title | Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data |
title_full | Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data |
title_fullStr | Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data |
title_full_unstemmed | Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data |
title_short | Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data |
title_sort | development of a machine learning model to distinguish between ulcerative colitis and crohn s disease using rna sequencing data |
topic | inflammatory bowel disease Crohn’s disease ulcerative colitis RNA sequencing machine learning |
url | https://www.mdpi.com/2075-4418/11/12/2365 |
work_keys_str_mv | AT sookyungpark developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT sangsookim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT giyounglee developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT sungyoonkim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT wankim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT chilwoolee developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT jonglyulpark developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT changhwanchoi developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT sangbumkang developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT taeohkim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT kibaebang developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT jaeyoungchun developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT jaemyungcha developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT jongpilim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT kwangsungahn developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT seonyoungkim developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata AT dongilpark developmentofamachinelearningmodeltodistinguishbetweenulcerativecolitisandcrohnsdiseaseusingrnasequencingdata |