Data-driven interpretable analysis for polysaccharide yield prediction

被引:5
作者
Tian, Yushi [1 ]
Yang, Xu [1 ]
Chen, Nianhua [1 ]
Li, Chunyan [1 ]
Yang, Wulin [2 ]
机构
[1] Northeast Agr Univ, Sch Resource & Environm, Harbin 150030, Peoples R China
[2] Peking Univ, Coll Environm Sci & Engn, Beijing 100871, Peoples R China
关键词
Cornstalk; Xylanase; Polysaccharide yield prediction; Machine learning; Model interpretability; STRUCTURAL-CHARACTERIZATION; RANDOM FORESTS; XYLOOLIGOSACCHARIDES;
D O I
10.1016/j.ese.2023.100321
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Cornstalks show promise as a raw material for polysaccharide production through xylanase. Rapid and accurate prediction of polysaccharide yield can facilitate process optimization, eliminating the need for extensive experimentation in actual production to refine reaction conditions, thereby saving time and costs. However, the intricate interplay of enzymatic factors poses challenges in predicting and optimizing polysaccharide yield accurately. Here, we introduce an innovative data-driven approach leveraging multiple artificial intelligence techniques to enhance polysaccharide production. We propose a machine learning framework to identify highly accurate polysaccharide yield prediction modeling methods and uncover optimal enzymatic parameter combinations. Notably, Random Forest (RF) and eXtreme Gradient Boost (XGB) demonstrate robust performance, achieving prediction accuracies of 93.0% and 95.6%, respectively, while an independently developed deep neural network (DNN) model achieves 91.1% accuracy. A feature importance analysis of XGB reveals the enzyme solution volume's dominant role (43.7%), followed by time (20.7%), substrate concentration (15%), temperature (15%), and pH (5.6%). Further interpretability analysis unveils complex parameter interactions and potential optimization strategies. This data-driven approach, incorporating machine learning, deep learning, and interpretable analysis, offers a viable pathway for polysaccharide yield prediction and the potential recovery of various agricultural residues. (c) 2023 The Authors. Published by Elsevier B.V. on behalf of Chinese Society for Environmental Sciences, Harbin Institute of Technology, Chinese Research Academy of Environmental Sciences. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:9
相关论文
共 45 条
[1]   Xylooligosaccharides (XOS) as an Emerging Prebiotic: Microbial Synthesis, Utilization, Structural Characterization, Bioactive Properties, and Applications [J].
Aachary, Ayyappan Appukuttan ;
Prapulla, Siddalingaiya Gurudutt .
COMPREHENSIVE REVIEWS IN FOOD SCIENCE AND FOOD SAFETY, 2011, 10 (01) :2-16
[2]   Comparison of acid and enzymatic hydrolysis of tobacco stalk xylan for preparation of xylooligosaccharides [J].
Akpinar, Ozlem ;
Erdogan, Kader ;
Bakir, Ufuk ;
Yilmaz, Levent .
LWT-FOOD SCIENCE AND TECHNOLOGY, 2010, 43 (01) :119-125
[3]   Structural features and antioxidant activity of xylooligosaccharides enzymatically produced from sugarcane bagasse [J].
Bian, Jing ;
Peng, Feng ;
Peng, Xiao-Peng ;
Peng, Pai ;
Xu, Feng ;
Sun, Run-Cang .
BIORESOURCE TECHNOLOGY, 2013, 127 :236-241
[4]   Proposed Multi-linear Regression Model to Identify Cyclooxygenase-2 Selective Active Pharmaceutical Ingredients [J].
Borna, Hojat ;
Khalili, Saeed ;
Zakeri, Alireza ;
Mard-Soltani, Maysam ;
Akbarzadeh, Ali Reza ;
Khalesi, Bahman ;
Payandeh, Zahra .
JOURNAL OF PHARMACEUTICAL INNOVATION, 2022, 17 (01) :19-25
[5]   Machine learning for molecular and materials science [J].
Butler, Keith T. ;
Davies, Daniel W. ;
Cartwright, Hugh ;
Isayev, Olexandr ;
Walsh, Aron .
NATURE, 2018, 559 (7715) :547-555
[6]   DIAGNOSING ASSETS IMPAIRMENT BY USING RANDOM FORESTS MODEL [J].
Chen, Ching-Lung ;
Wu, Chei-Wei .
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2012, 11 (01) :77-102
[7]   Machine Learning Interatomic Potentials as Emerging Tools for Materials Science [J].
Deringer, Volker L. ;
Caro, Miguel A. ;
Csanyi, Gabor .
ADVANCED MATERIALS, 2019, 31 (46)
[8]   Deep learning in retrosynthesis planning: datasets, models and tools [J].
Dong, Jingxin ;
Zhao, Mingyi ;
Liu, Yuansheng ;
Su, Yansen ;
Zeng, Xiangxiang .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
[9]   Advanced machine-learning techniques in drug discovery [J].
Elbadawi, Moe ;
Gaisford, Simon ;
Basit, Abdul W. .
DRUG DISCOVERY TODAY, 2020, 26 (03) :769-777
[10]   Continuous flow production of xylooligosaccharides by enzymatic hydrolysis [J].
Ghosh, Debjani ;
Vir, Anil B. ;
Garnier, Gil ;
Patti, Antonio F. ;
Tanner, Joanne .
CHEMICAL ENGINEERING SCIENCE, 2021, 244