An automatic generation of pre-processing strategy combined with machine learning multivariate analysis for NIR spectral data

被引:8
作者
Arianti, Nunik Destria [1 ]
Saputra, Edo [2 ,3 ]
Sitorus, Agustami [4 ,5 ]
机构
[1] Nusa Putra Univ, Dept Informat Syst, Sukabumi 43155, Indonesia
[2] Univ Riau, Fac Agr, Dept Agr Technol, Pekanbaru 28293, Indonesia
[3] IPB Univ, Agr Engn Study Program, Bogor 16680, Indonesia
[4] Natl Res & Innovat Agcy BRIN, Res Ctr Appropriate Technol, Subang 41213, Indonesia
[5] King Mongkuts Inst Technol Ladkrabang, Sch Engn, Dept Agr Engn, Bangkok 10520, Thailand
关键词
Ensemble pre-processing; Chemometrics; Machine learning; AGoES;
D O I
10.1016/j.jafr.2023.100625
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Pre-processing near-infrared (NIR) spectral data is indispensable in multivariate analysis, since the measured spectra of complex samples are often subject to overwhelming background, light scattering, varying noises, and other unexpected factors. Various pre-processing methods have been developed to remove or reduce the interference of these effects. Until now, most applications of NIR spectra pre-processing in multivariate calibration have been trial-and-error, with selecting a proper method depending on the nature of the data, expertise, and practitioner experience. Thus, it is usually challenging to determine the best pre-processing method for a given data. In order to tackle these problems, this study proposes a new concept of data pre-processing, namely, automatically generating a pre-processing strategy (AGoES). This concept belongs to the ensemble pre-processing method, where machine learning algorithms (PLSR, SVM, k-NN, DT, AB, and GPR) built on differently preprocessed data are combined by 5-fold cross-validation and grid search optimization. To investigate our concept, a public NIR spectral dataset was used to predict three responses, including dry matter content (DM), organic matter content (OM) and ammonium nitrogen content (AN) from manure organic waste. The results show that SVM is the best algorithm combined with the AGoES pre-processing to predict DM and AN with a ratio of prediction to deviation (RPD) of 3.619 and 2.996, respectively. The AB tandem with AGoES pre-processing is the best strategy for predicting OM with an RPD of 3.185. Therefore, in the framework of the AGoES concept, it is unsupervised pre-processing, more simple, and feasible to apply multivariate analysis using machine learning algorithms.
引用
收藏
页数:9
相关论文
共 29 条
  • [1] A selective ensemble preprocessing strategy for near-infrared spectral quantitative analysis of complex samples
    Bian, Xihui
    Wang, Kaiyi
    Tan, Erxuan
    Diwu, Pengyao
    Zhang, Fei
    Guo, Yugao
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 197
  • [2] Bongers Brandon J, 2019, Drug Discov Today Technol, V32-33, P89, DOI 10.1016/j.ddtec.2020.08.003
  • [3] Conzen J.P., 2003, Multivariate calibration: a practical guide for the method development in the analytical chemistry
  • [4] Parametric time warping
    Eilers, PHC
    [J]. ANALYTICAL CHEMISTRY, 2004, 76 (02) : 404 - 411
  • [5] A perfect smoother
    Eilers, PHC
    [J]. ANALYTICAL CHEMISTRY, 2003, 75 (14) : 3631 - 3636
  • [6] Real-time release testing of dissolution based on surrogate models developed by machine learning algorithms using NIR spectra, compression force and particle size distribution as input data
    Galata, Dorian Laszlo
    Konyves, Zsofia
    Nagy, Brigitta
    Novak, Mark
    Meszaros, Lilla Alexandra
    Szabo, Edina
    Farkas, Attila
    Marosi, Gyorgy
    Nagy, Zsombor Kristof
    [J]. INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2021, 597
  • [7] Geron A., 2022, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
  • [8] Simple and Effective Way for Data Preprocessing Selection Based on Design of Experiments
    Gerretzen, Jan
    Szymanska, Ewa
    Jansen, Jeroen J.
    Bart, Jacob
    van Manen, Henk-Jan
    van den Heuvel, Edwin R.
    Buydens, Lutgarde M. C.
    [J]. ANALYTICAL CHEMISTRY, 2015, 87 (24) : 12096 - 12103
  • [9] Supporting soil and land assessment with machine learning models using the Vis-NIR spectral response
    Gruszczynski, Stanislaw
    Gruszczynski, Wojciech
    [J]. GEODERMA, 2022, 405
  • [10] Application and comparison of several machine learning algorithms and their integration models in regression problems
    Huang, Jui-Chan
    Ko, Kuo-Min
    Shu, Ming-Hung
    Hsu, Bi-Min
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (10) : 5461 - 5469