Semi-supervised learning for MALDI-TOF mass spectrometry data classification: an application in the salmon industry

被引:7
作者
Gonzalez, Camila [1 ]
Astudillo, Cesar A. [2 ]
Lopez-Cortes, Xaviera A. [3 ]
Maldonado, Sebastian [4 ,5 ]
机构
[1] Univ Talca, Fac Engn, Magister Gest Operac, Curico, Chile
[2] Univ Talca, Fac Engn, Dept Comp Sci, Curico, Chile
[3] Univ Catol Maule, Fac Engn, Dept Comp Sci & Ind, Talca, Chile
[4] Univ Chile, Sch Econ & Business, Dept Management Control & Informat Syst, Santiago, Chile
[5] Inst Sistemas Complejos Ingn ISCI, Santiago, Chile
关键词
Semi-supervised learning; Mass spectrometry; Aquaculture; DESORPTION IONIZATION-TIME; IDENTIFICATION; MACHINE; SPECTRA; SYSTEM;
D O I
10.1007/s00521-023-08333-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
MALDI-TOF mass spectrometry (Matrix-Assisted Laser Desorption-Ionization (MALDI) and a Time-of-Flight detector (TOF) is a promising strategy for identifying patterns in data, establishing a relevant methodology for rapid and accurate microorganisms identification. However, this type of data is challenging to analyze due to its high complexity, and sometimes it is impossible to make a correct labeling. To address this problem, advanced data analysis techniques such as machine learning methods can be applied. In this work, we propose a novel approach using the semi-supervised paradigm for classifying MALDI-TOF mass spectrometry data. In addition, our study considers the use of labeled and unlabeled data to alleviate the issue of data labeling. Specifically, mass spectrometry data of healthy and infected salmon with the Piscirickettsia salmonis pathogen was analyzed. Our proposed algorithm based on self-training showed superior performance compared to traditional ML methods (NB, RF, SVM). Even considering a small percentage of labeled instances (25%), semi-supervised learning attains equilibrated performance across all metrics. Experimental results showed that self-training with a random forest classifier reached an accuracy of 0.9, sensitivity of 0.75, and specificity of 1. Furthermore, the feature selection allowed the identification of 15 potential biomarkers that define healthy and infected salmon profiles accurately. From a more general perspective, these results demonstrate the potential of the proposed semi-supervised learning methodology for classifying MALDI-TOF mass spectrometry data.
引用
收藏
页码:9381 / 9391
页数:11
相关论文
共 48 条
[1]   Binary biogeography-based optimization based SVM-RFE for feature selection [J].
Albashish, Dheeb ;
Hammouri, Abdelaziz, I ;
Braik, Malik ;
Atwan, Jaffar ;
Sahran, Shahnorbanun .
APPLIED SOFT COMPUTING, 2021, 101
[2]  
[Anonymous], 2013, BIOMED RES INT
[3]   Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in Chile [J].
Bosch, Paul ;
Lopez, Julio ;
Ramirez, Hector ;
Robotham, Hugo .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (10) :4029-4034
[4]   Incorporating Statistical Test and Machine Intelligence Into Strain Typing of Staphylococcus haemolyticus Based on Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry [J].
Chung, Chia-Ru ;
Wang, Hsin-Yao ;
Lien, Frank ;
Tseng, Yi-Ju ;
Chen, Chun-Hsien ;
Lee, Tzong-Yi ;
Liu, Tsui-Ping ;
Horng, Jorng-Tzong ;
Lu, Jang-Jih .
FRONTIERS IN MICROBIOLOGY, 2019, 10
[5]   Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry: a Fundamental Shift in the Routine Practice of Clinical Microbiology [J].
Clark, Andrew E. ;
Kaleta, Erin J. ;
Arora, Amit ;
Wolk, Donna M. .
CLINICAL MICROBIOLOGY REVIEWS, 2013, 26 (03) :547-603
[6]   Applications of MALDI-TOF mass spectrometry in clinical diagnostic microbiology [J].
Croxatto, Antony ;
Prod'hom, Guy ;
Greub, Gilbert .
FEMS MICROBIOLOGY REVIEWS, 2012, 36 (02) :380-407
[7]   Bacterial species identification from MALDI-TOF mass spectra through data analysis and machine learning [J].
De Bruyne, Katrien ;
Slabbinck, Bram ;
Waegeman, Willem ;
Vauterin, Paul ;
De Baets, Bernard ;
Vandamme, Peter .
SYSTEMATIC AND APPLIED MICROBIOLOGY, 2011, 34 (01) :20-29
[8]   Combining Machine Learning and Metabolomics to Identify Weight Gain Biomarkers [J].
Dias-Audibert, Flavia Luisa ;
Navarro, Luiz Claudio ;
de Oliveira, Diogo Noin ;
Delafiori, Jeany ;
Melo, Carlos Fernando Odir Rodrigues ;
Guerreiro, Tatiane Melina ;
Rosa, Flavia Troncon ;
Petenuci, Diego Lima ;
Watanabe, Maria Angelica Ehara ;
Velloso, Licio Augusto ;
Rocha, Anderson Rezende ;
Catharino, Rodrigo Ramos .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2020, 8
[9]   An overview on semi-supervised support vector machine [J].
Ding, Shifei ;
Zhu, Zhibin ;
Zhang, Xiekai .
NEURAL COMPUTING & APPLICATIONS, 2017, 28 (05) :969-978
[10]   Toward graph-based semi-supervised face beauty prediction [J].
Dornaika, Fadi ;
Wang, Kunwei ;
Arganda-Carreras, Ignacio ;
Elorza, Anne ;
Moujahid, Abdelmalik .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 142