Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records

被引:14
作者
Ding, Junqi [1 ]
Li, Bo [2 ]
Xu, Chang [1 ]
Qiao, Yan [3 ]
Zhang, Lingxian [1 ,4 ,5 ]
机构
[1] China Agr Univ, Beijing 100083, Peoples R China
[2] Beijing Informat Sci & Technol Univ, Sch Econ & Management, Beijing 100192, Peoples R China
[3] Beijing Plant Protect Stn, Beijing 100029, Peoples R China
[4] Minist Agr & Rural Affairs, Key Lab Agr Informationizat Standardizat, Beijing, Peoples R China
[5] China Agr Univ, Coll Informat & Elect Engn, 209 17 Qinghua Donglu, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Disease diagnosis; CEMRs; BERT; Domain-adaptive pre-training; RCNN; CLASSIFICATION; PATHOGENS;
D O I
10.1007/s10489-022-04346-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crop Electronic Medical Records (CEMRs) contain a rich diversity of information about disease characteristics, which is highly valuable as a support to plant doctors diagnosing disease. However, mining CEMR presents challenges, such as the lack of publicly available datasets, unlabeled data, and various agricultural and slang terms in the text, which are still unstudied. This study proposes a crop disease diagnosis model based on Bidirectional Encoder Representations from Transformers specific to the crop disease domain and RCNN (CdsBERT-RCNN). First, a crop disease corpus is constructed for domain-adaptive pre-training; second, semantic features of CEMRs are extracted by CdsBERT; third, distinct contextual information is further extracted, and disease diagnosis is achieved through RCNN. A CEMR dataset containing 32 diseases was constructed to validate the model. Experiments showed that the proposed method could effectively diagnose crop diseases with an F1-score of 85.63% and an accuracy of 85.65%. The proposed method outperformed widely used neural network models, i.e., CNN, DPCNN, RCNN, RNN, attention-based RNN, FastText, and Transformer, with more information obtained by self-supervised pre-training; and outperforms generic domain pre-trained language models, i.e., BERT, ERNIE, XLNet and RoBERTa, with data distribution more appropriate for the crop disease domain and effective fine-tuning strategies. Furthermore, we conduct ablation studies, demonstrating the value of DPAT and RCNN in our model. The results demonstrated the effectiveness of our framework for the CEMR-based disease diagnosis, with potential applications in electronic medical record systems and artificial intelligence in crop disease management.
引用
收藏
页码:15979 / 15992
页数:14
相关论文
共 49 条
  • [1] UAV-Based Remote Sensing Technique to Detect Citrus Canker Disease Utilizing Hyperspectral Imaging and Machine Learning
    Abdulridha, Jaafar
    Batuman, Ozgur
    Ampatzidis, Yiannis
    [J]. REMOTE SENSING, 2019, 11 (11)
  • [2] Factors influencing the use of deep learning for plant disease recognition
    Barbedo, Jayme G. A.
    [J]. BIOSYSTEMS ENGINEERING, 2018, 172 : 84 - 91
  • [3] Predicting dementia with routine care EMR data
    Ben Miled, Zina
    Haas, Kyle
    Black, Christopher M.
    Khandker, Rezaul Karim
    Chandrasekaran, Vasu
    Lipton, Richard
    Boustani, Malaz A.
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 102
  • [4] An ensemble model for classifying idioms and literal texts using BERT and RoBERTa
    Briskilal, J.
    Subalalitha, C. N.
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)
  • [5] A Novel Method of Heart Failure Prediction Based on DPCNN-XGBOOST Model
    Chen, Yuwen
    Qin, Xiaolin
    Zhang, Lige
    Yi, Bin
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 65 (01): : 495 - 510
  • [6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [7] Pre-training phenotyping classifiers
    Dligach, Dmitriy
    Afshar, Majid
    Miller, Timothy
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 113 (113)
  • [8] An optimized feature selection based on genetic approach and support vector machine for heart disease
    Gokulnath, Chandra Babu
    Shantharajah, S. P.
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 14777 - 14787
  • [9] Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics
    Hammou, Badr Ait
    Lahcen, Ayoub Ait
    Mouline, Salma
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (01)
  • [10] Recurrent convolutional neural network based multimodal disease risk prediction
    Hao, Yixue
    Usama, Mohd
    Yang, Jun
    Hossain, M. Shamim
    Ghoneim, Ahmed
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 92 : 76 - 83