Machine learning prediction of stalk lignin content using Fourier transform infrared spectroscopy in large scale maize germplasm

被引:1
作者
Wen, Yujing [1 ]
Liu, Xing [2 ]
He, Feng [1 ]
Shi, Yanli [1 ]
Chen, Fanghui [1 ]
Li, Wenfei [1 ]
Song, Youhong [3 ]
Li, Lin [4 ]
Jiang, Haiyang [1 ]
Zhou, Liang [2 ]
Wu, Leiming [1 ]
机构
[1] Anhui Agr Univ, Sch Life Sci, Natl Engn Lab Crop Resistance Breeding, Hefei 230036, Peoples R China
[2] Anhui Agr Univ, Sch Mat & Chem, Hefei 230036, Anhui, Peoples R China
[3] Anhui Agr Univ, Sch Agron, Hefei 230036, Peoples R China
[4] Huazhong Agr Univ, Natl Key Lab Crop Genet Improvement, Hubei Hongshan Lab, Wuhan 430070, Peoples R China
关键词
Maize; Lignin content; Fourier transform infrared spectroscopy; Machine learning; XGBoost; LightGBM; CELL-WALL POLYMERS; RIDGE-REGRESSION; SWEET SORGHUM; BIOMASS; DIGESTIBILITY; BIOETHANOL; SOFTWOOD; HARDWOOD; FEATURES; YIELD;
D O I
10.1016/j.ijbiomac.2024.136140
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Lignin has been recognized as a major factor contributing to lignocellulosic recalcitrance in biofuel production and attracted attentions as a high-value product in the biorefinery field. As the traditional wet chemical methods for detecting lignin content are labor-intensive, time-consuming and environment-toxic, it is an urgent need to develop high-throughput and environment-friendly techniques for large-scale crop germplasms screening. In this study, we conducted a Fourier transform infrared (FTIR) assay on 150 maize germplasms with a diverse lignin composition to build predictive models for lignin content in maize stalk. Principal component analysis (PCA) was applied to the FTIR spectra for use as model inputs. Classification and advanced gradient boosting machine (GBM) algorithms demonstrated higher predictive accuracy (0.82-0.96) compared to traditional linear and regularization algorithms (0.03-0.04) in the training set. Notably, two optimal models, built using the extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) algorithms, achieved R2 values of over 0.91 in the training set and over 0.82 in the test set. Overall, the combination of FTIR and machine learning (ML) algorithms offers a high-throughput and efficient method for predicting lignin content. This approach holds significant potential for genetic breeding and the effective utilization of maize in industrial production.
引用
收藏
页数:9
相关论文
共 59 条
  • [1] Adeosun S.O., 2019, Sustainable Lignin for Carbon Fibers: Principles, Techniques, and Applications, P193, DOI [10.1007/978-3-030-18792-75, DOI 10.1007/978-3-030-18792-75]
  • [2] A quick and precise online near-infrared spectroscopy assay for high-throughput screening biomass digestibility in large scale sugarcane germplasm
    Adnan, Muhammad
    Shen, Yinjuan
    Ma, Fumin
    Wang, Maoyao
    Jiang, Fuhong
    Hu, Qian
    Mao, Le
    Lu, Pan
    Chen, Xiaoru
    He, Guanyong
    Khan, Muhammad Tahir
    Deng, Zuhu
    Chen, Baoshan
    Zhang, Muqing
    Huang, Jiangfeng
    [J]. INDUSTRIAL CROPS AND PRODUCTS, 2022, 189
  • [3] Cellulose I crystallinity determination using FT-Raman spectroscopy: univariate and multivariate methods
    Agarwal, Umesh P.
    Reiner, Richard S.
    Ralph, Sally A.
    [J]. CELLULOSE, 2010, 17 (04) : 721 - 733
  • [4] Arlot S, 2012, HALINRIA, P7256
  • [5] CatBoost model and artificial intelligence techniques for corporate failure prediction
    Ben Jabeur, Sami
    Gharib, Cheima
    Mefteh-Wali, Salma
    Ben Arfi, Wissal
    [J]. TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2021, 166
  • [6] Chemical compositions of hardwood and softwood pulps employing photoacoustic Fourier transform infrared spectroscopy in combination with partial least-squares analysis
    Bjarnestad, S
    Dahlman, O
    [J]. ANALYTICAL CHEMISTRY, 2002, 74 (22) : 5851 - 5858
  • [7] Machine learning for molecular and materials science
    Butler, Keith T.
    Davies, Daniel W.
    Cartwright, Hugh
    Isayev, Olexandr
    Walsh, Aron
    [J]. NATURE, 2018, 559 (7715) : 547 - 555
  • [8] Hypertension Prediction in Adolescents Using Anthropometric Measurements: Do Machine Learning Models Perform Equally Well?
    Chai, Soo See
    Goh, Kok Luong
    Cheah, Whye Lian
    Chang, Yee Hui Robin
    Ng, Giap Weng
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [9] High-throughput prediction of stalk cellulose and hemicellulose content in maize using machine learning and Fourier transform infrared spectroscopy
    Chen, Fanghui
    Liu, Xing
    Lu, Chengchen
    Ruan, Mingxiu
    Wen, Yujing
    Wang, Shaodong
    Song, Youhong
    Li, Lin
    Zhou, Liang
    Jiang, Haiyang
    Wu, Leiming
    [J]. BIORESOURCE TECHNOLOGY, 2024, 413
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794