Spectroscopy Approaches for Food Safety Applications: Improving Data Efficiency Using Active Learning and Semi-supervised Learning

被引:1
作者
Zhang, Huanle [1 ]
Wisuthiphaet, Nicharee [2 ]
Cui, Hemiao [2 ]
Nitin, Nitin [2 ]
Liu, Xin [1 ]
Zhao, Qing [3 ]
机构
[1] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Food Sci & Technol, Davis, CA USA
[3] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY USA
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2022年 / 5卷
基金
美国食品与农业研究所;
关键词
food science; spectroscopy analysis; machine learning; data efficiency; active learning; semi-supervised learning; TRANSFORM INFRARED-SPECTROSCOPY; OPTIMAL EXPERIMENTAL-DESIGNS; FLUORESCENCE SPECTROSCOPY; REGRESSION; QUALITY; IDENTIFICATION; CHEMOMETRICS; MODELS;
D O I
10.3389/frai.2022.863261
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The past decade witnessed rapid development in the measurement and monitoring technologies for food science. Among these technologies, spectroscopy has been widely used for the analysis of food quality, safety, and nutritional properties. Due to the complexity of food systems and the lack of comprehensive predictive models, rapid and simple measurements to predict complex properties in food systems are largely missing. Machine Learning (ML) has shown great potential to improve the classification and prediction of these properties. However, the barriers to collecting large datasets for ML applications still persists. In this paper, we explore different approaches of data annotation and model training to improve data efficiency for ML applications. Specifically, we leverage Active Learning (AL) and Semi-Supervised Learning (SSL) and investigate four approaches: baseline passive learning, AL, SSL, and a hybrid of AL and SSL. To evaluate these approaches, we collect two spectroscopy datasets: predicting plasma dosage and detecting foodborne pathogen. Our experimental results show that, compared to the de facto passive learning approach, advanced approaches (AL, SSL, and the hybrid) can greatly reduce the number of labeled samples, with some cases decreasing the number of labeled samples by more than half.
引用
收藏
页数:13
相关论文
共 51 条
[41]   Semi-supervised learning of Hidden Markov Models for biological sequence analysis [J].
Tamposis, Ioannis A. ;
Tsirigos, Konstantinos D. ;
Theodoropoulou, Margarita C. ;
Kontou, Panagiota, I ;
Bagos, Pantelis G. .
BIOINFORMATICS, 2019, 35 (13) :2208-2215
[42]   Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study [J].
Triguero, Isaac ;
Garcia, Salvador ;
Herrera, Francisco .
KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 42 (02) :245-284
[43]   A machine learning workflow for raw food spectroscopic classification in a future industry [J].
Tsakanikas, Panagiotis ;
Karnavas, Apostolos ;
Panagou, Efstathios Z. ;
Nychas, George-John .
SCIENTIFIC REPORTS, 2020, 10 (01)
[44]   A survey on semi-supervised learning [J].
Van Engelen, Jesper E. ;
Hoos, Holger H. .
MACHINE LEARNING, 2020, 109 (02) :373-440
[45]   FOURIER-TRANSFORM INFRARED-SPECTROSCOPY APPLIED TO FOOD ANALYSIS [J].
VANDEVOORT, FR .
FOOD RESEARCH INTERNATIONAL, 1992, 25 (05) :397-403
[46]   An overview of foodborne pathogen detection: In the perspective of biosensors [J].
Velusamy, Vijayalakshmi ;
Arshak, Khalil ;
Korostynska, Olga ;
Oliwa, Kamila ;
Adley, Catherine .
BIOTECHNOLOGY ADVANCES, 2010, 28 (02) :232-254
[47]  
Wang X, 2020, NAT COMMUN, V11, DOI [10.1038/s41467-020-15476-6, 10.1038/s41467-020-16015-z, 10.1038/s41467-020-18785-y]
[48]   PRINCIPAL COMPONENT ANALYSIS [J].
WOLD, S ;
ESBENSEN, K ;
GELADI, P .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1987, 2 (1-3) :37-52
[49]   Active learning for regression using greedy sampling [J].
Wu, Dongrui ;
Lin, Chin-Teng ;
Huang, Jian .
INFORMATION SCIENCES, 2019, 474 :90-105
[50]   Rapid detection ofEscherichia coliusing bacteriophage-induced lysis and image analysis [J].
Yang, Xu ;
Wisuthiphaet, Nicharee ;
Young, Glenn M. ;
Nitin, Nitin .
PLOS ONE, 2020, 15 (06)