Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn's Disease Using RNA Sequencing Data

被引:17
作者
Park, Soo-Kyung [1 ,2 ,3 ]
Kim, Sangsoo [4 ]
Lee, Gi-Young [4 ]
Kim, Sung-Yoon [4 ]
Kim, Wan [4 ]
Lee, Chil-Woo [3 ]
Park, Jong-Lyul [5 ]
Choi, Chang-Hwan [6 ]
Kang, Sang-Bum [7 ]
Kim, Tae-Oh [8 ]
Bang, Ki-Bae [9 ]
Chun, Jaeyoung [10 ]
Cha, Jae-Myung [11 ]
Im, Jong-Pil [12 ,13 ]
Ahn, Kwang-Sung [14 ]
Kim, Seon-Young [5 ]
Park, Dong-Il [1 ,2 ,3 ]
机构
[1] Sungkyunkwan Univ, Kangbuk Samsung Hosp, Sch Med, Div Gastroenterol,Dept Internal Med, Seoul 03181, South Korea
[2] Sungkyunkwan Univ, Kangbuk Samsung Hosp, Sch Med, Inflammatory Bowel Dis Ctr, Seoul 03181, South Korea
[3] Sungkyunkwan Univ, Sch Med, Kangbuk Samsung Hosp, Med Res Inst, Seoul 03181, South Korea
[4] Soongsil Univ, Dept Bioinformat, Seoul 06978, South Korea
[5] Korea Res Inst Bioscience & Biotechnol KRIBB, Personalized Med Res Ctr, Daejeon 34141, South Korea
[6] Chung Ang Univ, Coll Med, Dept Internal Med, Seoul 04388, South Korea
[7] Catholic Univ Korea, Daejeon St Marys Hosp, Coll Med, Dept Internal Med, Daejeon 34943, South Korea
[8] Inje Univ, Coll Med, Haeundae Paik Hosp, Dept Internal Med, Busan 48108, South Korea
[9] Dankook Univ, Coll Med, Dept Internal Med, Cheonan 31116, South Korea
[10] Yonsei Univ, Coll Med, Gangnam Severance Hosp, Dept Internal Med, Seoul 06273, South Korea
[11] Kyung Hee Univ, Coll Med, Kyung Hee Univ Hosp Gang Dong, Dept Internal Med, Seoul 05278, South Korea
[12] Seoul Natl Univ, Coll Med, Dept Internal Med, Seoul 03080, South Korea
[13] Seoul Natl Univ, Coll Med, Liver Res Inst, Seoul 03080, South Korea
[14] PDXen Biosyst Inc, Funct Genome Inst, Daejeon 34129, South Korea
基金
新加坡国家研究基金会;
关键词
inflammatory bowel disease; Crohn's disease; ulcerative colitis; RNA sequencing; machine learning; INFLAMMATORY-BOWEL-DISEASE; GENE-EXPRESSION;
D O I
10.3390/diagnostics11122365
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Crohn's disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (n = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.
引用
收藏
页数:11
相关论文
共 19 条
[1]   Genomics and the Multifactorial Nature of Human Autoimmune Disease [J].
Cho, Judy H. ;
Gregersen, Peter K. .
NEW ENGLAND JOURNAL OF MEDICINE, 2011, 365 (17) :1612-1623
[2]   Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci [J].
Franke, Andre ;
McGovern, Dermot P. B. ;
Barrett, Jeffrey C. ;
Wang, Kai ;
Radford-Smith, Graham L. ;
Ahmad, Tariq ;
Lees, Charlie W. ;
Balschun, Tobias ;
Lee, James ;
Roberts, Rebecca ;
Anderson, Carl A. ;
Bis, Joshua C. ;
Bumpstead, Suzanne ;
Ellinghaus, David ;
Festen, Eleonora M. ;
Georges, Michel ;
Green, Todd ;
Haritunians, Talin ;
Jostins, Luke ;
Latiano, Anna ;
Mathew, Christopher G. ;
Montgomery, Grant W. ;
Prescott, Natalie J. ;
Raychaudhuri, Soumya ;
Rotter, Jerome I. ;
Schumm, Philip ;
Sharma, Yashoda ;
Simms, Lisa A. ;
Taylor, Kent D. ;
Whiteman, David ;
Wijmenga, Cisca ;
Baldassano, Robert N. ;
Barclay, Murray ;
Bayless, Theodore M. ;
Brand, Stephan ;
Buening, Carsten ;
Cohen, Albert ;
Colombel, Jean-Frederick ;
Cottone, Mario ;
Stronati, Laura ;
Denson, Ted ;
De Vos, Martine ;
D'Inca, Renata ;
Dubinsky, Marla ;
Edwards, Cathryn ;
Florin, Tim ;
Franchimont, Denis ;
Gearry, Richard ;
Glas, Juergen ;
Van Gossum, Andre .
NATURE GENETICS, 2010, 42 (12) :1118-+
[3]   Current new challenges in the management of ulcerative colitis [J].
Fukuda, Tomohiro ;
Naganuma, Makoto ;
Kanai, Takanori .
INTESTINAL RESEARCH, 2019, 17 (01) :36-44
[4]   High-Resolution Gene Expression Profiling Using RNA Sequencing in Patients With Inflammatory Bowel Disease and in Mouse Models of Colitis [J].
Holgersen, Kristine ;
Kutlu, Burak ;
Fox, Brian ;
Serikawa, Kyle ;
Lord, James ;
Hansen, Axel Kornerup ;
Holm, Thomas Lindebo .
JOURNAL OF CROHNS & COLITIS, 2015, 9 (06) :492-506
[5]   DNA Methylation and Transcription Patterns in Intestinal Epithelial Cells From Pediatric Patients With Inflammatory Bowel Diseases Differentiate Disease Subtypes and Associate With Outcome [J].
Howell, Kate Joanne ;
Kraiczy, Judith ;
Nayak, Komal M. ;
Gasparetto, Marco ;
Ross, Alexander ;
Lee, Claire ;
Mak, Tim N. ;
Koo, Bon-Kyoung ;
Kumar, Nitin ;
Lawley, Trevor ;
Sinha, Anupam ;
Rosenstiel, Philip ;
Heuschkel, Robert ;
Stegle, Oliver ;
Zilbauer, Matthias .
GASTROENTEROLOGY, 2018, 154 (03) :585-598
[6]   DAVID-WS: a stateful web service to facilitate gene/protein list analysis [J].
Jiao, Xiaoli ;
Sherman, Brad T. ;
Huang, Da Wei ;
Stephens, Robert ;
Baseler, Michael W. ;
Lane, H. Clifford ;
Lempicki, Richard A. .
BIOINFORMATICS, 2012, 28 (13) :1805-1806
[7]   Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease [J].
Jostins, Luke ;
Ripke, Stephan ;
Weersma, Rinse K. ;
Duerr, Richard H. ;
McGovern, Dermot P. ;
Hui, Ken Y. ;
Lee, James C. ;
Schumm, L. Philip ;
Sharma, Yashoda ;
Anderson, Carl A. ;
Essers, Jonah ;
Mitrovic, Mitja ;
Ning, Kaida ;
Cleynen, Isabelle ;
Theatre, Emilie ;
Spain, Sarah L. ;
Raychaudhuri, Soumya ;
Goyette, Philippe ;
Wei, Zhi ;
Abraham, Clara ;
Achkar, Jean-Paul ;
Ahmad, Tariq ;
Amininejad, Leila ;
Ananthakrishnan, Ashwin N. ;
Andersen, Vibeke ;
Andrews, Jane M. ;
Baidoo, Leonard ;
Balschun, Tobias ;
Bampton, Peter A. ;
Bitton, Alain ;
Boucher, Gabrielle ;
Brand, Stephan ;
Buening, Carsten ;
Cohain, Ariella ;
Cichon, Sven ;
D'Amato, Mauro ;
De Jong, Dirk ;
Devaney, Kathy L. ;
Dubinsky, Marla ;
Edwards, Cathryn ;
Ellinghaus, David ;
Ferguson, Lynnette R. ;
Franchimont, Denis ;
Fransen, Karin ;
Gearry, Richard ;
Georges, Michel ;
Gieger, Christian ;
Glas, Juergen ;
Haritunians, Talin ;
Hart, Ailsa .
NATURE, 2012, 491 (7422) :119-124
[8]   Development of a Clinical and Genetic Prediction Model for Early Intestinal Resection in Patients with Crohn's Disease: Results from the IMPACT Study [J].
Kang, Eun Ae ;
Jang, Jongha ;
Choi, Chang Hwan ;
Kang, Sang Bum ;
Bang, Ki Bae ;
Kim, Tae Oh ;
Seo, Geom Seog ;
Cha, Jae Myung ;
Chun, Jaeyoung ;
Jung, Yunho ;
Kim, Hyun Gun ;
Im, Jong Pil ;
Kim, Sangsoo ;
Ahn, Kwang Sung ;
Lee, Chang Kyun ;
Kim, Hyo Jong ;
Kim, Min Suk ;
Park, Dong Il .
JOURNAL OF CLINICAL MEDICINE, 2021, 10 (04) :1-14
[9]   Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype [J].
Kim, Daehwan ;
Paggi, Joseph M. ;
Park, Chanhee ;
Bennett, Christopher ;
Salzberg, Steven L. .
NATURE BIOTECHNOLOGY, 2019, 37 (08) :907-+
[10]   featureCounts: an efficient general purpose program for assigning sequence reads to genomic features [J].
Liao, Yang ;
Smyth, Gordon K. ;
Shi, Wei .
BIOINFORMATICS, 2014, 30 (07) :923-930