Bacterial species identification using MALDI-TOF mass spectrometry and machine learning techniques: A large-scale benchmarking study

被引:43
作者
Mortier, Thomas [1 ]
Wieme, Anneleen D. [2 ]
Vandamme, Peter [2 ]
Waegeman, Willem [1 ]
机构
[1] Univ Ghent, Fac Biosci Engn, Dept Data Anal & Math Modelling, KERMIT, Coupure Links 653, B-9000 Ghent, Belgium
[2] Univ Ghent, Fac Sci, Lab Microbiol, BCCM LMG Bacteria Collect, KL Ledeganckstr 35, B-9000 Ghent, Belgium
关键词
Bacterial species identification; MALDI-TOF MS; Machine learning; Extreme classification; Hierarchical classification; Neural networks; DESORPTION IONIZATION-TIME; HIERARCHICAL-CLASSIFICATION; STAPHYLOCOCCUS-AUREUS; SPECTRA; NETWORKS; SYSTEM; MS;
D O I
10.1016/j.csbj.2021.11.004
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Today machine learning methods are commonly deployed for bacterial species identification using MALDI-TOF mass spectrometry data. However, most of the studies reported in literature only consider very traditional machine learning methods on small datasets that contain a limited number of species. In this paper we present benchmarking results on an unprecedented scale for a wide range of machine learning methods, using datasets that contain almost 100,000 spectra and more than 1000 different species. The size and the diversity of the data allow to compare three important identification scenarios that are often not distinguished in literature, i.e., identification for novel biological replicates, novel strains and novel species that are not present in the training data. The results demonstrate that in all three scenarios acceptable identification rates are obtained, but the numbers are typically lower than those reported in studies with a more limited analysis. Using hierarchical classification methods, we also demonstrate that taxonomic information is in general not well preserved in MALDI-TOF mass spectrometry data. For the novel species scenario, we apply for the first time neural networks with Monte Carlo dropout, which have shown to be successful in other domains, such as computer vision, for the detection of novel species. (C) 2021 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:6157 / 6168
页数:12
相关论文
共 93 条
[1]  
Amodei Dario, 2016, ARXIV, DOI 10.48550/ARXIV.1606.06565
[2]  
Anne Sauve, P GEN SIGN PROC STAT
[3]  
[Anonymous], 2016, ECMLPKDD, DOI [10.1007, 10.1007/978-3-319-46227-1_32, DOI 10.1007/978-3-319-46227-1_32]
[4]  
[Anonymous], 2019, CORR
[5]   Towards Open Set Deep Networks [J].
Bendale, Abhijit ;
Boult, Terrance E. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1563-1572
[6]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[7]  
Beygelzimer A., 2009, P 25 C UNCERTAINTY A, P51
[8]   Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry as an Alternative to 16S rRNA Gene Sequencing for Identification of Difficult-To-Identify Bacterial Strains [J].
Bizzini, A. ;
Jaton, K. ;
Romo, D. ;
Bille, J. ;
Prod'hom, G. ;
Greub, G. .
JOURNAL OF CLINICAL MICROBIOLOGY, 2011, 49 (02) :693-696
[9]   Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry for Combined Species Identification and Drug Sensitivity Testing in Mycobacteria [J].
Ceyssens, Pieter-Jan ;
Soetaert, Karine ;
Timke, Markus ;
Van den Bossche, An ;
Sparbier, Katrin ;
De Cremer, Koen ;
Kostrzewa, Markus ;
Hendrickx, Marijke ;
Mathys, Vanessa .
JOURNAL OF CLINICAL MICROBIOLOGY, 2017, 55 (02) :624-634
[10]   Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry: a Fundamental Shift in the Routine Practice of Clinical Microbiology [J].
Clark, Andrew E. ;
Kaleta, Erin J. ;
Arora, Amit ;
Wolk, Donna M. .
CLINICAL MICROBIOLOGY REVIEWS, 2013, 26 (03) :547-603