Optimizing feature selection with gradient boosting machines in PLS regression for predicting moisture and protein in multi-country corn kernels via NIR spectroscopy

被引:29
作者
Zheng, Runyu [1 ]
Jia, Yuyao [1 ]
Ullagaddi, Chidanand [2 ]
Allen, Cody [1 ]
Rausch, Kent [1 ]
Singh, Vijay [1 ]
Schnable, James C. [2 ]
Kamruzzaman, Mohammed [1 ]
机构
[1] Univ Illinois Champaign Urbana, Dept Agr & Biol Engn, Urbana, IL 61801 USA
[2] Univ Nebraska Lincoln, Dept Agron & Hort, Lincoln, NE USA
关键词
Gradient boosting machine (GBM); Feature selection; SHapley additive exPlanations (SHAP); Partial least squares regression (PLSR); Corn kernels; Near-infrared (NIR) spectroscopy; Component prediction; LEAST-SQUARES REGRESSION; VARIABLE SELECTION;
D O I
10.1016/j.foodchem.2024.140062
中图分类号
O69 [应用化学];
学科分类号
081704 ;
摘要
Differences in moisture and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models trained on a few environments may underestimate error rates and bias. We assembled corn samples from diverse international environments and used NIR with chemometrics and partial least squares regression (PLSR) to determine moisture and protein. The potential of five feature selection methods to improve prediction accuracy was assessed by extracting sensitive wavelengths. Gradient boosting machines (GBMs), particularly CatBoost and LightGBM, were found to effectively select crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, 2174 nm) and protein (887, 1212, 1705, 1891, 2097, 2456 nm). SHAP plots highlighted significant wavelength contributions to model prediction. These results illustrate GBMs' effectiveness in feature engineering for agricultural and food sector applications, including developing multi -country global calibration models for moisture and protein in corn kernels.
引用
收藏
页数:12
相关论文
共 68 条
[1]   Partial least squares regression and projection on latent structure regression (PLS Regression) [J].
Abdi, Herve .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (01) :97-106
[2]   Calibration set reduction by the selection of a subset containing the best fitting samples showing optimally predictive ability [J].
Andries, Jan P. M. ;
Heyden, Yvan Vander .
TALANTA, 2024, 266
[3]   Explainable artificial intelligence: an analytical review [J].
Angelov, Plamen P. ;
Soares, Eduardo A. ;
Jiang, Richard ;
Arnold, Nicholas I. ;
Atkinson, Peter M. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 11 (05)
[4]  
AOAC International, 2023, Official Methods of Analysis of AOAC International, V22nd
[5]  
Ali ZA, 2023, Academic Journal of Nawroz University, V12, P320, DOI [10.25007/ajnu.v12n2a1612, DOI 10.25007/AJNU.V12N2A1612, 10.25007/ajnu.v12n2a1612]
[6]   Near-Infrared Spectroscopy in Bio-Applications [J].
Bec, Krzysztof B. ;
Grabska, Justyna ;
Huck, Christian W. .
MOLECULES, 2020, 25 (12)
[7]   Breakthrough Potential in Near-Infrared Spectroscopy: Spectra Simulation. A Review of Recent Developments [J].
Bec, Krzysztof B. ;
Huck, Christian W. .
FRONTIERS IN CHEMISTRY, 2019, 7
[8]   A comparative analysis of gradient boosting algorithms [J].
Bentejac, Candice ;
Csorgo, Anna ;
Martinez-Munoz, Gonzalo .
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) :1937-1967
[9]   Detection of protein, starch, oil, and moisture content of corn kernels using one-dimensional convolutional autoencoder and near-infrared spectroscopy [J].
Cataltas, Ozcan ;
Tutuncu, Kemal .
PEERJ COMPUTER SCIENCE, 2023, 9
[10]   An optimization strategy for waveband selection in FT-NIR quantitative analysis of corn protein [J].
Chen, Hua-Zhou ;
Song, Qi-Qing ;
Tang, Guo-Qiang ;
Xu, Li-Li .
JOURNAL OF CEREAL SCIENCE, 2014, 60 (03) :595-601