Multi-model fusion stacking ensemble learning method for the prediction of berberine by FT-NIR spectroscopy

被引:12
作者
Li, Xiaoyu [1 ]
Chen, Huazhou [1 ,2 ,4 ]
Xu, Lili [3 ]
Mo, Qiushuang [1 ]
Du, Xinrong [1 ]
Tang, Guoqiang [1 ,2 ]
机构
[1] Guilin Univ Technol, Sch Math & Stat, Guilin 541004, Peoples R China
[2] Guilin Univ Technol, Ctr Data Anal & Algorithm Technol, Guilin 541004, Peoples R China
[3] Beibu Gulf Univ, Coll Marine Sci, Qinzhou 535011, Peoples R China
[4] Guilin Univ Technol, Sch Math & Stat, 12 Jiangan Rd, Guilin 541004, Peoples R China
基金
中国国家自然科学基金;
关键词
FT-NIR spectroscopy; Berberine; Stacking ensemble learning; Particle swarm optimization algorithm; Adaptive inertia weight;
D O I
10.1016/j.infrared.2024.105169
中图分类号
TH7 [仪器、仪表];
学科分类号
0804 ; 080401 ; 081102 ;
摘要
Rhizoma Coptidis is a Chinese herbal medicine with antibacterial and anti-inflammatory properties. It has extensive applications in modern medicine. The content of berberine in Rhizoma Coptidis directly determines its quality. Fourier transforms near-infrared (FT-NIR) spectroscopy is a commonly used non-destructive method for rapidly detecting berberine content. In contrast to single-supervised learning algorithms in machine learning, ensemble learning combines individual learning algorithms to create a stable and better-performing strong-supervised model. This study collected spectral data of Rhizoma Coptidis using FT-NIR spectroscopy technology and established a chemometric model using a stacking ensemble approach with multiple models. Partial Least Squares (PLS), Adaptive Boosting (AdaBoost), Gradient boosting decision trees (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) regression models were chosen as alternative base models, different Stacking models were established by random combinations. To fully leverage the strengths of each model and enhance predictive capability, an adaptive inertia weight particle swarm optimization algorithm (AWPSO) was used to search for the optimal parameters. The correlation coefficient of the test (RT) and the root mean square error of the test (RMSET) systematically evaluated the model performance. Finally, AWPSO-RF, AWPSOXGBoost, and AWPSO-AdaBoost were selected as the base models. The RMSET and RT for RF, XGBoost, and AdaBoost were 0.226, 0.250, 0.195, and 0.871, 0.830, 0.927. After optimizing with the AWPSO algorithm, the RMSET and RT for AWPSO-RF, AWPSO-XGBoost, and AWPSO-AdaBoost were 0.226, 0.245, 0.194, and 0.871, 0.843, 0.922, respectively. The RMSET and RT values for the stacking ensemble were 0.174 and 0.932. The prediction accuracy and generalization ability of multi -model fusion stacking ensemble learning are superior to the single -model regression methods. Therefore, the stacking ensemble learning method that combines AdaBoost, RF, and XGBoost regression models is effective and feasible for assisting in the detection of berberine content in Rhizoma Coptidis.
引用
收藏
页数:10
相关论文
共 36 条
[1]   1H NMR, FAAS, portable NIR, benchtop NIR, and ATR-FTIR-MIR spectroscopies for characterizing and discriminating new Brazilian Canephora coffees in a multi-block analysis perspective [J].
Baqueta, Michel Rocha ;
Valderrama, Patricia ;
Mandrone, Manuela ;
Poli, Ferruccio ;
Coqueiro, Aline ;
Costa-Santos, Augusto Cesar ;
Rebellato, Ana Paula ;
Luz, Gisele Marcondes ;
Rocha, Rodrigo Barros ;
Pallone, Juliana Azevedo Lima ;
Marini, Federico .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2023, 240
[2]   Comparison of different processing approaches by SVM and RF on HS-MS eNose and NIR Spectrometry data for the discrimination of gasoline samples [J].
Barea-Sepulveda, Marta ;
Ferreiro-Gonzalez, Marta ;
Calle, Jose Luis P. ;
Barbero, Gerardo F. ;
Ayuso, Jesus ;
Palma, Miguel .
MICROCHEMICAL JOURNAL, 2022, 172
[3]   Optimization of One versus All-SVM using AdaBoost algorithm for rainfall classification and estimation from multispectral MSG data [J].
Belghit, Amar ;
Lazri, Mourad ;
Ouallouche, Fethi ;
Labadi, Karim ;
Ameur, Soltane .
ADVANCES IN SPACE RESEARCH, 2023, 71 (01) :946-963
[4]   Modeling potential arsenic enrichment and distribution using stacking ensemble learning in the lower Yellow River Plain, China [J].
Cao, Wengeng ;
Fu, Yu ;
Cheng, Yanpei ;
Zhai, Wenhua ;
Sun, Xiaoyue ;
Ren, Yu ;
Pan, Deng .
JOURNAL OF HYDROLOGY, 2023, 625
[5]   A decision tree network with semi-supervised entropy learning strategy for spectroscopy aided detection of blood hemoglobin [J].
Chen, Huazhou ;
Li, Xiaorong ;
Ai, Wu ;
Lin, Qinyong ;
Cai, Ken .
SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2023, 291
[6]   A quasi-qualitative strategy for FT-NIR discriminant prediction: Case study on rapid detection of soil organic matter [J].
Chen, Huazhou ;
Xu, Lili ;
Gu, Jie ;
Meng, Fangxiu ;
Qiao, Hanli .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 224
[7]  
Devianti Ismy A. S., 2023, Case Studies in Chemical and Environmental Engineering, V8, DOI [DOI 10.1016/J.CSCEE.2023.100384, 10.1016/j.cscee.2023.100384]
[8]   A stacked regression ensemble approach for the quantitative determination of biomass feedstock compositions using near infrared spectroscopy [J].
Dumancas, Gerard ;
Adrianto, Indra .
SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2022, 276
[9]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[10]   Supporting soil and land assessment with machine learning models using the Vis-NIR spectral response [J].
Gruszczynski, Stanislaw ;
Gruszczynski, Wojciech .
GEODERMA, 2022, 405