BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes

被引:1
作者
DiMucci, Demetrius [1 ,2 ,7 ]
Kon, Mark [1 ,3 ]
Segre, Daniel [1 ,2 ,4 ,5 ,6 ]
机构
[1] Boston Univ, Bioinformat Grad Program, Boston, MA 02215 USA
[2] Boston Univ, Biol Design Ctr, Boston, MA 02215 USA
[3] Boston Univ, Dept Math & Stat, Boston, MA 02215 USA
[4] Boston Univ, Dept Biol, 5 Cummington St, Boston, MA 02215 USA
[5] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
[6] Boston Univ, Dept Phys, 590 Commonwealth Ave, Boston, MA 02215 USA
[7] Forsyth Inst, Cambridge, MA USA
基金
美国国家科学基金会;
关键词
high-order interactions; microbiome; epistasis; random forest; Boolean rules; decision tree; complex phenotypes; CROHNS-DISEASE; RANDOM FOREST; CLASSIFICATION; MICROBIOTA; DISCOVERY; DYSBIOSIS;
D O I
10.3389/fmolb.2021.663532
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables ("rules") frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn's disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.
引用
收藏
页数:12
相关论文
共 52 条
[1]   Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer [J].
Ai, Dongmei ;
Pan, Hongfei ;
Han, Rongbao ;
Li, Xiaoxin ;
Liu, Gang ;
Xia, Li C. .
GENES, 2019, 10 (02)
[2]   Interpretable regularized class association rules algorithm for classification in a categorical data space [J].
Azmi, Mohamed ;
Runger, George C. ;
Berrado, Abdelaziz .
INFORMATION SCIENCES, 2019, 483 :313-331
[3]   Iterative random forests to discover predictive and stable high-order interactions [J].
Basu, Sumanta ;
Kumbier, Karl ;
Brown, James B. ;
Yu, Bin .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (08) :1943-1948
[4]  
Berry D., 2018, PROBIOTIC PREBIOTIC
[5]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Carding Simon, 2015, Microbial Ecology in Health and Disease, V26, P26191, DOI 10.3402/mehd.v26.26191
[8]   Anhedonia in irritable bowel syndrome and in inflammatory bowel diseases and its relationship with abdominal pain [J].
Carpinelli, Luna ;
Bucci, Cristina ;
Santonicola, Antonella ;
Zingone, Fabiana ;
Ciacci, Carolina ;
Iovino, Paola .
NEUROGASTROENTEROLOGY AND MOTILITY, 2019, 31 (03)
[9]   Can we open the black box of AI? [J].
Castelvecchi D. .
Nature, 2016, 538 (7623) :20-23
[10]   Personalized Clinical Phenotyping through Systems Medicine and Artificial Intelligence [J].
Cesario, Alfredo ;
D'Oria, Marika ;
Bove, Francesco ;
Privitera, Giuseppe ;
Boskoski, Ivo ;
Pedicino, Daniela ;
Boldrini, Luca ;
Erra, Carmen ;
Loreti, Claudia ;
Liuzzo, Giovanna ;
Crea, Filippo ;
Armuzzi, Alessandro ;
Gasbarrini, Antonio ;
Calabresi, Paolo ;
Padua, Luca ;
Costamagna, Guido ;
Antonelli, Massimo ;
Valentini, Vincenzo ;
Auffray, Charles ;
Scambia, Giovanni .
JOURNAL OF PERSONALIZED MEDICINE, 2021, 11 (04)