Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN

被引:48
作者
Kuang, Xingyan [1 ]
Wang, Fan [1 ]
Hernandez, Kyle M. [1 ,2 ]
Zhang, Zhenyu [1 ]
Grossman, Robert L. [1 ,2 ]
机构
[1] Univ Chicago, Ctr Translat Data Sci, Chicago, IL 60615 USA
[2] Univ Chicago, Dept Med, Chicago, IL 60637 USA
关键词
D O I
10.1038/s41598-022-06449-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Effective and timely antibiotic treatment depends on accurate and rapid in silico antimicrobial-resistant (AMR) predictions. Existing statistical rule-based Mycobacterium tuberculosis (MTB) drug resistance prediction methods using bacterial genomic sequencing data often achieve varying results: high accuracy on some antibiotics but relatively low accuracy on others. Traditional machine learning (ML) approaches have been applied to classify drug resistance for MTB and have shown more stable performance. However, there is no study that uses deep learning architecture like Convolutional Neural Network (CNN) on a large and diverse cohort of MTB samples for AMR prediction. We developed 24 binary classifiers of MTB drug resistance status across eight anti-MTB drugs and three different ML algorithms: logistic regression, random forest and 1D CNN using a training dataset of 10,575 MTB isolates collected from 16 countries across six continents, where an extended pan-genome reference was used for detecting genetic features. Our 1D CNN architecture was designed to integrate both sequential and non-sequential features. In terms of F1-scores, 1D CNN models are our best classifiers that are also more accurate and stable than the state-of-the-art rule-based tool Mykrobe predictor (81.1 to 93.8%, 93.7 to 96.2%, 93.1 to 94.8%, 95.9 to 97.2% and 97.1 to 98.2% for ethambutol, rifampicin, pyrazinamide, isoniazid and ofloxacin respectively). We applied filter-based feature selection to find AMR relevant features. All selected variant features are AMR-related ones in CARD database. 78.8% of them are also in the catalogue of MTB mutations that were recently identified as drug resistance-associated ones by WHO. To facilitate ML model development for AMR prediction, we packaged every step into an automated pipeline and shared the source code at .
引用
收藏
页数:10
相关论文
共 31 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database [J].
Alcock, Brian P. ;
Raphenya, Amogelang R. ;
Lau, Tammy T. Y. ;
Tsang, Kara K. ;
Bouchard, Megane ;
Edalatmand, Arman ;
Huynh, William ;
Nguyen, Anna-Lisa, V ;
Cheng, Annie A. ;
Liu, Sihan ;
Min, Sally Y. ;
Miroshnichenko, Anatoly ;
Tran, Hiu-Ki ;
Werfalli, Rafik E. ;
Nasir, Jalees A. ;
Oloni, Martins ;
Speicher, David J. ;
Florescu, Alexandra ;
Singh, Bhavya ;
Faltyn, Mateusz ;
Hernandez-Koutoucheva, Anastasia ;
Sharma, Arjun N. ;
Bordeleau, Emily ;
Pawlowski, Andrew C. ;
Zubyk, Haley L. ;
Dooley, Damion ;
Griffiths, Emma ;
Maguire, Finlay ;
Winsor, Geoff L. ;
Beiko, Robert G. ;
Brinkman, Fiona S. L. ;
Hsiao, William W. L. ;
Domselaar, Gary, V ;
McArthur, Andrew G. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D517-D525
[3]   Prediction of Susceptibility to First-Line Tuberculosis Drugs by DNA Sequencing [J].
Allix-Beguec, Caroline ;
Arandjelovic, Irena ;
Bi, Lijun ;
Beckert, Patrick ;
Bonnet, Maryline ;
Bradley, Phelim ;
Cabibbe, Andrea M. ;
Cancino-Munoz, Irving ;
Caulfield, Mark J. ;
Chaiprasert, Angkana ;
Cirillo, Daniela M. ;
Clifton, David ;
Comas, Inaki ;
Crook, Derrick W. ;
De Filippo, Maria R. ;
de Neeling, Han ;
Diel, Roland ;
Drobniewski, Francis A. ;
Faksri, Kiatichai ;
Farhat, Maha R. ;
Fleming, Joy ;
Fowler, Philip ;
Fowler, Tom A. ;
Gao, Qian ;
Gardy, Jennifer ;
Gascoyne-Binzi, Deborah ;
Gibertoni-Cruz, Ana-Luiza ;
Gil-Brusola, Ana ;
Golubchik, Tanya ;
Gonzalo, Ximena ;
Grandjean, Louis ;
He, Guangxue ;
Guthrie, Jennifer L. ;
Hoosdally, Sarah ;
Hunt, Martin ;
Iqbal, Zamin ;
Ismail, Nazir ;
Johnston, James ;
Khanzada, Faisal M. ;
Khor, Chiea C. ;
Kohl, Thomas A. ;
Kong, Clare ;
Lipworth, Sam ;
Liu, Qingyun ;
Maphalala, Gugu ;
Martinez, Elena ;
Mathys, Vanessa ;
Merker, Matthias ;
Miotto, Paolo ;
Mistry, Nerges .
NEW ENGLAND JOURNAL OF MEDICINE, 2018, 379 (15) :1403-1415
[4]  
[Anonymous], 2019, ANTIBIOTIC RESISTANC, P1
[5]  
[Anonymous], 2019, Treatment for TB Disease | Treatment | TB | CDC. https://www.cdc.gov/tb/topic/treatment/tbdisease.htm
[6]   Sequencing-based methods and resources to study antimicrobial resistance [J].
Boolchandani, Manish ;
D'Souza, Alaric W. ;
Dantas, Gautam .
NATURE REVIEWS GENETICS, 2019, 20 (06) :356-370
[7]   Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis [J].
Bradley, Phelim ;
Gordon, N. Claire ;
Walker, Timothy M. ;
Dunn, Laura ;
Heys, Simon ;
Huang, Bill ;
Earle, Sarah ;
Pankhurst, Louise J. ;
Anson, Luke ;
de Cesare, Mariateresa ;
Piazza, Paolo ;
Votintseva, Antonina A. ;
Golubchik, Tanya ;
Wilson, Daniel J. ;
Wyllie, David H. ;
Diel, Roland ;
Niemann, Stefan ;
Feuerriegel, Silke ;
Kohl, Thomas A. ;
Ismail, Nazir ;
Omar, Shaheed V. ;
Smith, E. Grace ;
Buck, David ;
McVean, Gil ;
Walker, A. Sarah ;
Peto, Tim E. A. ;
Crook, Derrick W. ;
Iqbal, Zamin .
NATURE COMMUNICATIONS, 2015, 6
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   A critical analysis of the review on antimicrobial resistance report and the infectious disease financing facility [J].
Brogan, David M. ;
Mossialos, Elias .
GLOBALIZATION AND HEALTH, 2016, 12
[10]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28