R.ROSETTA: an interpretable machine learning framework

被引:13
作者
Garbulowski, Mateusz [1 ]
Diamanti, Klev [1 ,2 ]
Smolinska, Karolina [1 ]
Baltzer, Nicholas [1 ,3 ]
Stoll, Patricia [1 ,4 ]
Bornelov, Susanne [1 ,5 ]
Ohrn, Aleksander [6 ]
Feuk, Lars [2 ]
Komorowski, Jan [1 ,7 ,8 ,9 ]
机构
[1] Uppsala Univ, Dept Cell & Mol Biol, Uppsala, Sweden
[2] Uppsala Univ, Dept Immunol Genet & Pathol, Uppsala, Sweden
[3] Canc Registry Norway, Dept Res, Oslo, Norway
[4] Swiss Fed Inst Technol, Dept Biosyst Sci & Engn, Zurich, Switzerland
[5] Univ Cambridge, Canc Res UK Cambridge Inst, Cambridge, England
[6] Univ Oslo, Dept Informat, Oslo, Norway
[7] Swedish Coll Adv Study, Uppsala, Sweden
[8] Polish Acad Sci, Inst Comp Sci, Warsaw, Poland
[9] Washington Natl Primate Res Ctr, Seattle, WA USA
基金
瑞典研究理事会; 美国国家卫生研究院;
关键词
Transcriptomics; Interpretable machine learning; Big data; Rough sets; Rule-based classification; R package; ROUGH SET-THEORY; AUTISM SPECTRUM DISORDERS; GENE; CYCLOOXYGENASE-2; INFLAMMATION; ALGORITHMS; SELECTION; NETWORKS; MODELS; CELLS;
D O I
10.1186/s12859-021-04049-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundMachine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components.ResultsWe present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA. To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes.ConclusionsR.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
引用
收藏
页数:18
相关论文
共 82 条
  • [1] Autism and Increased Paternal Age Related Changes in Global Levels of Gene Expression Regulation
    Alter, Mark D.
    Kharkar, Rutwik
    Ramsey, Keri E.
    Craig, David W.
    Melmed, Raun D.
    Grebe, Theresa A.
    Bay, R. Curtis
    Ober-Reynolds, Sharman
    Kirwan, Janet
    Jones, Josh J.
    Turner, J. Blake
    Hen, Rene
    Stephan, Dietrich A.
    [J]. PLOS ONE, 2011, 6 (02):
  • [2] [Anonymous], 2019, TEAMHG MEMEX EXPLAIN
  • [3] [Anonymous], 2015, PACKAGE RPART
  • [4] [Anonymous], 2013, INFORM SCI LETT, DOI [10.12785/isl/020105, DOI 10.12785/ISL/020105]
  • [5] Variation in Gene Express ion in Autism Spectrum Disorders: An Extensive Review of Transcriptomic Studies
    Ansel, Ashley
    Rosenzweig, Joshua P.
    Zisman, Philip D.
    Melamed, Michal
    Gesundheit, Benjamin
    [J]. FRONTIERS IN NEUROSCIENCE, 2017, 10
  • [6] Opening the Black Box: Interpretable Machine Learning for Geneticists
    Azodi, Christina B.
    Tang, Jiliang
    Shiu, Shin-Han
    [J]. TRENDS IN GENETICS, 2020, 36 (06) : 442 - 455
  • [7] Babaknejad N, 2016, IRAN J CHILD NEUROL, V10, P1
  • [8] Bello R, 2017, STUD COMPUT INTELL, V708, P87, DOI 10.1007/978-3-319-54966-8_5
  • [9] NCS-1 is a regulator of calcium signaling in health and disease
    Boeckel, Goeran R.
    Ehrlich, Barbara E.
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH, 2018, 1865 (11): : 1660 - 1667
  • [10] Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    Bornelov, Susanne
    Marillet, Simon
    Komorowski, Jan
    [J]. BMC BIOINFORMATICS, 2014, 15