Transfer learning enables predictions in soil-borne diseases

被引:0
作者
Xin, Lei [1 ]
Xie, Penghao [1 ]
Wen, Tao [1 ]
Niu, Guoqing [1 ]
Yuan, Jun [1 ]
机构
[1] Nanjing Agr Univ, Jiangsu Collaborat Innovat Ctr Solid Organ Wastes, Educ Minist Engn Ctr Resource Saving Fertilizers, Key Lab Organ Based Fertilizers China,Jiangsu Prov, Nanjing 210095, Peoples R China
关键词
soil disease; feature importance; heterogeneous integration strategy; transfer learning; APPORTIONMENT; BIOLOGY;
D O I
10.1007/s42832-024-0258-y
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
The Transformer model precisely predicts soil health status from high-throughput sequencing data.The SMOTE algorithm addresses data imbalance issues, improving model accuracy.Transfer learning validates the model on small samples, strengthening its generalization capabilities.Inhibiting the occurrence of soil-borne diseases is considered as the most favorable approach for promoting sustainable agricultural development. Constructing soil disease prediction models can serve precision agriculture. However, the analysis results of the metaframework often contradict each other, causing inconsistency in the important features of machine learning results. Therefore, it is necessary to compare the classification accuracy of various machine learning models and further optimize the features of the models to enhance their classification accuracy. Here, we conducted a comparison of eight common machine learning algorithms (XGBoost, CatBoost, Decision Tree, LGBM, Na & iuml;ve Byes, Perceptron, Logistic, and Random Forest) at the levels of family, genus, and class. The important features of the model were extracted based on the differences in model accuracy and important features, followed by an interpretable analysis of these important features using feature importance. Subsequently, the data underwent resampling using the SMOTE algorithm, and the results show that the SMOTE-Transformer model performs well, surpassing the training results of the voting and stacking strategies, with an accuracy reaching 90%. We have also deployed the SMOTE-Transformer model on sequencing data, which has an accuracy of over 80%. The construction of SMOTE-Transformer model provides a new idea for soil microbial data analysis by greatly improving the accuracy and robustness of soil microbial data processing tools.
引用
收藏
页数:13
相关论文
共 41 条
  • [1] Bengio Y., 2007, P ADV NEUR INF PROC, P153
  • [2] Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1007/BF00058655
  • [3] Metagenome-Wide Association Study and Machine Learning Prediction of Bulk Soil Microbiome and Crop Productivity
    Chang, Hao-Xun
    Haudenshield, James S.
    Bowen, Charles R.
    Hartman, Glen L.
    [J]. FRONTIERS IN MICROBIOLOGY, 2017, 8
  • [4] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [5] TIS Transformer: remapping the human proteome using deep learning
    Clauwaert, Jim
    McVey, Zahra
    Gupta, Ramneek
    Menschaert, Gerben
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2023, 5 (01)
  • [6] Del Vento D., 2019, Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), P75
  • [7] Net primary production of a forest ecosystem with experimental CO2 enrichment
    DeLucia, EH
    Hamilton, JG
    Naidu, SL
    Thomas, RB
    Andrews, JA
    Finzi, AC
    Lavine, M
    Matamala, R
    Mohan, JE
    Hendrey, GR
    Schlesinger, WH
    [J]. SCIENCE, 1999, 284 (5417) : 1177 - 1179
  • [8] Denny Y. R, 2022, Gravity, Jurnal Ilmiah Penelitian dan Pembelajaran Fisika, V8, P57
  • [9] A self-knowledge distillation-driven CNN-LSTM model for predicting disease outcomes using longitudinal microbiome data
    Fung, Daryl L. X.
    Li, Xu
    Leung, Carson K.
    Hu, Pingzhao
    [J]. BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [10] Gao Y, 2020, NAT COMMUN, V11, DOI 10.1038/s41467-020-18918-3