INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis

被引:7
作者
Zabeti, Hooman [1 ]
Dexter, Nick [2 ]
Safari, Amir Hosein [1 ]
Sedaghat, Nafiseh [1 ]
Libbrecht, Maxwell [1 ]
Chindelevitch, Leonid [3 ]
机构
[1] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[2] Simon Fraser Univ, Dept Math, Burnaby, BC, Canada
[3] Imperial Coll, Dept Infect Dis Epidemiol, London, England
基金
英国医学研究理事会;
关键词
Drug resistance; Interpretable machine learning; Group testing; Integer linear programming; Rule-based learning; Whole-genome sequencing; RANDOM FORESTS;
D O I
10.1186/s13015-021-00198-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.
引用
收藏
页数:12
相关论文
共 65 条
[1]   Group Testing: An Information Theory Perspective [J].
Aldridge, Matthew ;
Johnson, Oliver ;
Scarlett, Jonathan .
FOUNDATIONS AND TRENDS IN COMMUNICATIONS AND INFORMATION THEORY, 2019, 15 (3-4) :196-392
[2]   Group Testing Algorithms: Bounds and Simulations [J].
Aldridge, Matthew ;
Baldassini, Leonardo ;
Johnson, Oliver .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2014, 60 (06) :3671-3687
[3]  
[Anonymous], 2020, IBM ILOG CPLEX OPT S
[4]  
[Anonymous], 2021, IEEE Trans. Broadcast.
[5]   DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data [J].
Arango-Argoty, Gustavo ;
Garner, Emily ;
Prudent, Amy ;
Heath, Lenwood S. ;
Vikesland, Peter ;
Zhang, Liqing .
MICROBIOME, 2018, 6
[6]   Boolean Compressed Sensing and Noisy Group Testing [J].
Atia, George K. ;
Saligrama, Venkatesh .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (03) :1880-1901
[7]   Frequency and Geographic Distribution of gyrA and gyrB Mutations Associated with Fluoroquinolone Resistance in Clinical Mycobacterium tuberculosis Isolates: A Systematic Review [J].
Avalos, Elisea ;
Catanzaro, Donald ;
Catanzaro, Antonino ;
Ganiats, Theodore ;
Brodine, Stephanie ;
Alcaraz, John ;
Rodwell, Timothy .
PLOS ONE, 2015, 10 (03)
[8]   Iterative random forests to discover predictive and stable high-order interactions [J].
Basu, Sumanta ;
Kumbier, Karl ;
Brown, James B. ;
Yu, Bin .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (08) :1943-1948
[9]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[10]   Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis [J].
Bradley, Phelim ;
Gordon, N. Claire ;
Walker, Timothy M. ;
Dunn, Laura ;
Heys, Simon ;
Huang, Bill ;
Earle, Sarah ;
Pankhurst, Louise J. ;
Anson, Luke ;
de Cesare, Mariateresa ;
Piazza, Paolo ;
Votintseva, Antonina A. ;
Golubchik, Tanya ;
Wilson, Daniel J. ;
Wyllie, David H. ;
Diel, Roland ;
Niemann, Stefan ;
Feuerriegel, Silke ;
Kohl, Thomas A. ;
Ismail, Nazir ;
Omar, Shaheed V. ;
Smith, E. Grace ;
Buck, David ;
McVean, Gil ;
Walker, A. Sarah ;
Peto, Tim E. A. ;
Crook, Derrick W. ;
Iqbal, Zamin .
NATURE COMMUNICATIONS, 2015, 6