Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning

被引:97
作者
Ren, Yunxiao [1 ]
Chakraborty, Trinad [2 ,3 ]
Doijad, Swapnil [2 ,3 ]
Falgenhauer, Linda [3 ,4 ,5 ]
Falgenhauer, Jane [2 ,3 ]
Goesmann, Alexander [3 ,6 ]
Hauschild, Anne-Christin [1 ]
Schwengers, Oliver [3 ,6 ]
Heider, Dominik [1 ]
机构
[1] Philipps Univ Marburg, Fac Math & Comp Sci, Dept Data Sci Biomed, D-35032 Marburg, Germany
[2] Justus Liebig Univ Giessen, Inst Med Microbiol, D-35392 Giessen, Germany
[3] German Ctr Infect Res, Partner Site Giessen Marburg Langen, D-35392 Giessen, Germany
[4] Justus Liebig Univ Giessen, Inst Hyg & Environm Med, D-35392 Giessen, Germany
[5] Hess Univ Kompetenzzentrum Krankenhaushyg, D-35392 Giessen, Germany
[6] Justus Liebig Univ Giessen, Dept Bioinformat & Syst Biol, D-35392 Giessen, Germany
关键词
CHAOS GAME REPRESENTATION; ANTIBIOTIC-RESISTANCE; ESCHERICHIA-COLI; READ ALIGNMENT; MODEL;
D O I
10.1093/bioinformatics/btab681
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for the prediction of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done. Results: In this study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF) and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin, cefotaxime, ceftazidime and gentamicin. We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public dataset. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic.
引用
收藏
页码:325 / 334
页数:10
相关论文
共 55 条
[1]   Phenotypic and genotypic characterization of antibiotic resistance in the methicillin-resistant Staphylococcus aureus strains isolated from hospital cockroaches [J].
Abdolmaleki, Zohreh ;
Mashak, Zohreh ;
Dehkordi, Farhad Safarpoor .
ANTIMICROBIAL RESISTANCE AND INFECTION CONTROL, 2019, 8 (1)
[2]   Analysis of genomic sequences by Chaos Game Representation [J].
Almeida, JS ;
Carriço, JA ;
Maretzek, A ;
Noble, PA ;
Fletcher, M .
BIOINFORMATICS, 2001, 17 (05) :429-437
[3]   DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data [J].
Arango-Argoty, Gustavo ;
Garner, Emily ;
Prudent, Amy ;
Heath, Lenwood S. ;
Vikesland, Peter ;
Zhang, Liqing .
MICROBIOME, 2018, 6
[4]   Antimicrobial Resistance and Virulence: a Successful or Deleterious Association in the Bacterial World? [J].
Beceiro, Alejandro ;
Tomas, Maria ;
Bou, German .
CLINICAL MICROBIOLOGY REVIEWS, 2013, 26 (02) :185-230
[5]   Sequencing-based methods and resources to study antimicrobial resistance [J].
Boolchandani, Manish ;
D'Souza, Alaric W. ;
Dantas, Gautam .
NATURE REVIEWS GENETICS, 2019, 20 (06) :356-370
[6]   Antibiotic resistance and single-nucleotide polymorphism cluster grouping type in a multinational sample of resistant Mycobacterium tuberculosis isolates [J].
Brimacombe, M. ;
Hazbon, M. ;
Motiwala, A. S. ;
Alland, D. .
ANTIMICROBIAL AGENTS AND CHEMOTHERAPY, 2007, 51 (11) :4157-4159
[7]   fastp: an ultra-fast all-in-one FASTQ preprocessor [J].
Chen, Shifu ;
Zhou, Yanqing ;
Chen, Yaru ;
Gu, Jia .
BIOINFORMATICS, 2018, 34 (17) :884-890
[8]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[9]   Twelve years of SAMtools and BCFtools [J].
Danecek, Petr ;
Bonfield, James K. ;
Liddle, Jennifer ;
Marshall, John ;
Ohan, Valeriu ;
Pollard, Martin O. ;
Whitwham, Andrew ;
Keane, Thomas ;
McCarthy, Shane A. ;
Davies, Robert M. ;
Li, Heng .
GIGASCIENCE, 2021, 10 (02)
[10]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158