An automatic generation of pre-processing strategy combined with machine learning multivariate analysis for NIR spectral data

被引:10
作者
Arianti, Nunik Destria [1 ]
Saputra, Edo [2 ,3 ]
Sitorus, Agustami [4 ,5 ]
机构
[1] Nusa Putra Univ, Dept Informat Syst, Sukabumi 43155, Indonesia
[2] Univ Riau, Fac Agr, Dept Agr Technol, Pekanbaru 28293, Indonesia
[3] IPB Univ, Agr Engn Study Program, Bogor 16680, Indonesia
[4] Natl Res & Innovat Agcy BRIN, Res Ctr Appropriate Technol, Subang 41213, Indonesia
[5] King Mongkuts Inst Technol Ladkrabang, Sch Engn, Dept Agr Engn, Bangkok 10520, Thailand
关键词
Ensemble pre-processing; Chemometrics; Machine learning; AGoES;
D O I
10.1016/j.jafr.2023.100625
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Pre-processing near-infrared (NIR) spectral data is indispensable in multivariate analysis, since the measured spectra of complex samples are often subject to overwhelming background, light scattering, varying noises, and other unexpected factors. Various pre-processing methods have been developed to remove or reduce the interference of these effects. Until now, most applications of NIR spectra pre-processing in multivariate calibration have been trial-and-error, with selecting a proper method depending on the nature of the data, expertise, and practitioner experience. Thus, it is usually challenging to determine the best pre-processing method for a given data. In order to tackle these problems, this study proposes a new concept of data pre-processing, namely, automatically generating a pre-processing strategy (AGoES). This concept belongs to the ensemble pre-processing method, where machine learning algorithms (PLSR, SVM, k-NN, DT, AB, and GPR) built on differently preprocessed data are combined by 5-fold cross-validation and grid search optimization. To investigate our concept, a public NIR spectral dataset was used to predict three responses, including dry matter content (DM), organic matter content (OM) and ammonium nitrogen content (AN) from manure organic waste. The results show that SVM is the best algorithm combined with the AGoES pre-processing to predict DM and AN with a ratio of prediction to deviation (RPD) of 3.619 and 2.996, respectively. The AB tandem with AGoES pre-processing is the best strategy for predicting OM with an RPD of 3.185. Therefore, in the framework of the AGoES concept, it is unsupervised pre-processing, more simple, and feasible to apply multivariate analysis using machine learning algorithms.
引用
收藏
页数:9
相关论文
共 29 条
[21]   The elephant in the room: Predictive performance of PLS models [J].
Shmueli, Galit ;
Ray, Soumya ;
Estrada, Juan Manuel Velasquez ;
Chatla, Suneel Babu .
JOURNAL OF BUSINESS RESEARCH, 2016, 69 (10) :4552-4564
[22]   Development of a screening method for adulteration detection in coconut milk via mid-infrared spectroscopy: A study of linear and nonlinear regression method [J].
Sitorus, Agustami ;
Bulan, Ramayanty .
JOURNAL OF AGRICULTURE AND FOOD RESEARCH, 2022, 10
[23]  
Suthaharan S, 2016, INTEGR SER INFORM SY, V36, P1, DOI 10.1007/978-1-4899-7641-3
[24]   Releasing fast and slow: Non-destructive prediction of density and drug release from SLS 3D printed tablets using NIR spectroscopy [J].
Trenfield, Sarah J. ;
Xu, Xiaoyan ;
Goyanes, Alvaro ;
Rowland, Martin ;
Wilsdon, David ;
Gaisford, Simon ;
Basit, Abdul W. .
INTERNATIONAL JOURNAL OF PHARMACEUTICS-X, 2023, 5
[25]   Monitoring of the evolution of an industrial compost and prediction of some compost properties by NIR spectroscopy [J].
Vernoux, A. ;
Guiliano, M. ;
Le Dreau, Y. ;
Kister, J. ;
Dupuy, N. ;
Doumenq, P. .
SCIENCE OF THE TOTAL ENVIRONMENT, 2009, 407 (07) :2390-2403
[26]  
Vinzi VE., 2010, Handbook of partial least squares, V201
[27]   Ensemble preprocessing of near-infrared (NIR) spectra for multivariate calibration [J].
Xu, Lu ;
Zhou, Yan-Ping ;
Tang, Li-Juan ;
Wu, Hai-Long ;
Jiang, Jian-Hui ;
Shen, Guo-Li ;
Yu, Ru-Qin .
ANALYTICA CHIMICA ACTA, 2008, 616 (02) :138-143
[28]   Agronomic characterization of anaerobic digestates with near-infrared spectroscopy [J].
Zennaro, Bastien ;
Marchand, Paul ;
Latrille, Eric ;
Thoisy, Jeanne-Chantal ;
Houot, Sabine ;
Girardin, Cyril ;
Steyer, Jean-Philippe ;
Beline, Fabrice ;
Charnier, Cyrille ;
Richard, Charlotte ;
Accarion, Guillaume ;
Jimenez, Julie .
JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2022, 317
[29]   Machine learning on big data: Opportunities and challenges [J].
Zhou, Lina ;
Pan, Shimei ;
Wang, Jianwu ;
Vasilakos, Athanasios V. .
NEUROCOMPUTING, 2017, 237 :350-361