A biochemically-interpretable machine learning classifier for microbial GWAS

被引:59
作者
Kavvas, Erol S. [1 ]
Yang, Laurence [2 ]
Monk, Jonathan M. [1 ]
Heckmann, David [1 ]
Palsson, Bernhard O. [1 ,3 ]
机构
[1] Univ Calif San Diego, Dept Bioengn, San Diego, CA 92103 USA
[2] Queens Univ, Dept Chem Engn, Kingston, ON K7L 3N6, Canada
[3] Univ Calif San Diego, Dept Pediat, San Diego, CA 92103 USA
关键词
PARA-AMINOSALICYLIC ACID; MYCOBACTERIUM-TUBERCULOSIS; DRUG-RESISTANCE; MUTATIONS; MODELS; PYRAZINAMIDE; METABOLISM; VIRULENCE; PATHWAYS;
D O I
10.1038/s41467-020-16310-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Current machine learning classifiers have successfully been applied to whole-genome sequencing data to identify genetic determinants of antimicrobial resistance (AMR), but they lack causal interpretation. Here we present a metabolic model-based machine learning classifier, named Metabolic Allele Classifier (MAC), that uses flux balance analysis to estimate the biochemical effects of alleles. We apply the MAC to a dataset of 1595 drug-tested Mycobacterium tuberculosis strains and show that MACs predict AMR phenotypes with accuracy on par with mechanism-agnostic machine learning models (isoniazid AUC=0.93) while enabling a biochemical interpretation of the genotype-phenotype map. Interpretation of MACs for three antibiotics (pyrazinamide, para-aminosalicylic acid, and isoniazid) recapitulates known AMR mechanisms and suggest a biochemical basis for how the identified alleles cause AMR. Extending flux balance analysis to identify accurate sequence classifiers thus contributes mechanistic insights to GWAS, a field thus far dominated by mechanism-agnostic results. Current machine learning classifiers have been applied to whole-genome sequencing data to identify determinants of antimicrobial resistance, but they lack interpretability. Here the authors present a metabolic machine learning classifier that uses flux balance analysis to estimate the biochemical effects of alleles.
引用
收藏
页数:11
相关论文
共 67 条
[1]   Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) [J].
Adadi, Amina ;
Berrada, Mohammed .
IEEE ACCESS, 2018, 6 :52138-52160
[2]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[3]   DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data [J].
Arango-Argoty, Gustavo ;
Garner, Emily ;
Prudent, Amy ;
Heath, Lenwood S. ;
Vikesland, Peter ;
Zhang, Liqing .
MICROBIOME, 2018, 6
[4]   Sequencing-based methods and resources to study antimicrobial resistance [J].
Boolchandani, Manish ;
D'Souza, Alaric W. ;
Dantas, Gautam .
NATURE REVIEWS GENETICS, 2019, 20 (06) :356-370
[5]   Constraint-based models predict metabolic and associated cellular functions [J].
Bordbar, Aarash ;
Monk, Jonathan M. ;
King, Zachary A. ;
Palsson, Bernhard O. .
NATURE REVIEWS GENETICS, 2014, 15 (02) :107-120
[6]   Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity [J].
Bosi, Emanuele ;
Monk, Jonathan M. ;
Aziz, Ramy K. ;
Fondi, Marco ;
Nizet, Victor ;
Palsson, Bernhard O. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (26) :E3801-E3809
[7]  
Burnham K., 2007, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach
[8]   Next-Generation Machine Learning for Biological Networks [J].
Camacho, Diogo M. ;
Collins, Katherine M. ;
Powers, Rani K. ;
Costello, James C. ;
Collins, James J. .
CELL, 2018, 173 (07) :1581-1592
[9]   Analysis of genetic variation and potential applications in genome-scale metabolic modeling [J].
Cardoso, Joao G. R. ;
Andersen, Mikael Rordam ;
Herrgard, Markus J. ;
Sonnenschein, Nikolaus .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2015, 3
[10]  
Caspi R, 2008, NUCLEIC ACIDS RES, V36, pD623, DOI [10.1093/nar/gkm900, 10.1093/nar/gkt1103]