Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

被引:14
作者
Ding, Junqi [1 ]
Li, Bo [2 ]
Xu, Chang [1 ]
Qiao, Yan [3 ]
Zhang, Lingxian [1 ,4 ,5 ]
机构
[1] China Agr Univ, Beijing 100083, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Sch Econ & Management, Beijing 100192, Peoples R China
[3] Beijing Plant Protect Stn, Beijing 100029, Peoples R China
[4] Minist Agr & Rural Affairs, Key Lab Agr Informationizat Standardizat, Beijing, Peoples R China
[5] China Agr Univ, Coll Informat & Elect Engn, 209 17 Qinghua Donglu, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Disease diagnosis; CEMRs; BERT; Domain-adaptive pre-training; RCNN; CLASSIFICATION; PATHOGENS;
D O I
10.1007/s10489-022-04346-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and various agricultural and slang terms in the text, which are still unstudied. This study proposes a crop disease diagnosis model based on Bidirectional Encoder Representations from Transformers specific to the crop disease domain and RCNN (CdsBERT-RCNN). First, a crop disease corpus is constructed for domain-adaptive pre-training; second, semantic features of CEMRs are extracted by CdsBERT; third, distinct contextual information is further extracted, and disease diagnosis is achieved through RCNN. A CEMR dataset containing 32 diseases was constructed to validate the model. Experiments showed that the proposed method could effectively diagnose crop diseases with an F1-score of 85.63% and an accuracy of 85.65%. The proposed method outperformed widely used neural network models, i.e., CNN, DPCNN, RCNN, RNN, attention-based RNN, FastText, and Transformer, with more information obtained by self-supervised pre-training; and outperforms generic domain pre-trained language models, i.e., BERT, ERNIE, XLNet and RoBERTa, with data distribution more appropriate for the crop disease domain and effective fine-tuning strategies. Furthermore, we conduct ablation studies, demonstrating the value of DPAT and RCNN in our model. The results demonstrated the effectiveness of our framework for the CEMR-based disease diagnosis, with potential applications in electronic medical record systems and artificial intelligence in crop disease management.
引用
收藏
页码:15979 / 15992
页数:14
相关论文
共 49 条
  • [11] Interpreting a recurrent neural network's predictions of ICU mortality risk
    Ho, Long, V
    Aczon, Melissa
    Ledbetter, David
    Wetzel, Randall
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 114
  • [12] Comparison and validation of algorithms for asthma diagnosis in an electronic medical record system
    Howell, Daniel
    Rogers, Linda
    Kasarskis, Andrew
    Twyman, Kathryn
    [J]. ANNALS OF ALLERGY ASTHMA & IMMUNOLOGY, 2022, 128 (06) : 677 - +
  • [13] An automated detection and classification of citrus plant diseases using image processing techniques: A review
    Iqbal, Zahid
    Khan, Muhammad Attique
    Sharif, Muhammad
    Shah, Jamal Hussain
    Rehman, Muhammad Habib Ur
    Javed, Kashif
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2018, 153 : 12 - 32
  • [14] Medical knowledge embedding based on recursive neural network for multi-disease diagnosis
    Jiang, Jingchi
    Wang, Huanzheng
    Xie, Jing
    Guo, Xitong
    Guan, Yi
    Yu, Qiubin
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 103
  • [15] Text classification based on deep belief network and softmax regression
    Jiang, Mingyang
    Liang, Yanchun
    Feng, Xiaoyue
    Fan, Xiaojing
    Pei, Zhili
    Xue, Yu
    Guan, Renchu
    [J]. NEURAL COMPUTING & APPLICATIONS, 2018, 29 (01) : 61 - 70
  • [16] Ketkar N., 2021, DEEP LEARNING PYTHON, P27, DOI [DOI 10.1007/978-1-4842-5364-9_2, 10.1007/978-1-4842-5364-92]
  • [17] Oslcfit (organic simultaneous LSTM and CNN Fit): A novel deep learning based solution for sentiment polarity classification of reviews
    Kiran, R.
    Kumar, Pradeep
    Bhasker, Bharat
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 157 (157)
  • [18] Evidential Reasoning Rule-Based Decision Support System for Predicting ICU Admission and In-Hospital Death of Trauma
    Kong, Guilan
    Xu, Dong-Ling
    Yang, Jian-Bo
    Wang, Tianbing
    Jiang, Baoguo
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (11): : 7131 - 7142
  • [19] Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on Chinese clinical data
    Leng, Jiewu
    Wang, Dewen
    Ma, Xin
    Yu, Pengjiu
    Wei, Li
    Chen, Wenge
    [J]. APPLIED INTELLIGENCE, 2022, 52 (11) : 13114 - 13131
  • [20] Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks
    Li, Xiaozheng
    Wang, Huazhen
    He, Huixin
    Du, Jixiang
    Chen, Jian
    Wu, Jinzhun
    [J]. BMC BIOINFORMATICS, 2019, 20 (1)