Fine-tuned encoder models with data augmentation beat ChatGPT in agricultural named entity recognition and relation extraction

被引:2
作者
De, Sayan [1 ]
Sanyal, Debarshi Kumar [2 ]
Mukherjee, Imon [1 ]
机构
[1] Indian Inst Informat Technol Kalyani, Kalyani 741235, India
[2] Indian Assoc Cultivat Sci, Jadavpur 700032, India
关键词
Named entity recognition; Relation extraction; Knowledge graph; Text data augmentation;
D O I
10.1016/j.eswa.2025.127126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Agricultural research produces vast amounts of unstructured textual data, which remains largely underutilized due to the lack of robust tools for automated processing. If effectively processed, this underutilized data can provide critical insights to advance agricultural practices, decision-making, and sustainability. This work focuses on applying Named Entity Recognition (NER) and Relation Extraction (RE) to convert unstructured data into structured formats, addressing the challenges of domain-specific terminology and limited annotated datasets. This scarcity is primarily due to the domain-specific terminology, contextual complexity, and lack of annotated data in the agricultural domain. This study addresses these challenges by proposing sophisticated data augmentation techniques, validated using large language models and human reviewers, to enhance training data. We introduce AgNER-BERTa and AgRE-BERTa, two encoder-based models tailored for agricultural NER and RE tasks, and compare them with state-of-the-art (SOTA) baselines, including SciBERT, SpanBERT, and generative decoder models like ChatGPT. Our experiments demonstrate superior performance, achieving 98% accuracy for NER and 97% for RE outperforming SOTA models. The extracted entities and relations are used to construct the Agricultural Knowledge Graph (AgKG), providing structured, queryable insights to support precision agriculture, policy-making, and sustainable farming practices.
引用
收藏
页数:14
相关论文
共 39 条
[1]  
Alam I., 2024, 2024 18 INT C UB INF, P1
[2]   Artificial Hallucinations in ChatGPT: Implications in Scientific Writing [J].
Alkaissi, Hussam ;
McFarlane, Samy I. .
CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (02)
[3]  
Bang Y, 2023, Arxiv, DOI [arXiv:2302.04023, 10.48550/arXiv.2302.04023]
[4]  
Bedemariam R, 2025, Arxiv, DOI arXiv:2501.08167
[5]   RoBERT-Agr: An Entity Relationship Extraction Model of Massive Agricultural Text Based on the RoBERTa and CRF Algorithm [J].
Chen, Tianyue ;
Qian, Yongqiang ;
Wang, Yaojun ;
Chen, Xiaojin ;
Di Ouyang ;
Dong, Shihao ;
Li, Xiang ;
Zhao, Jingbo ;
Huang, Lan .
2023 IEEE 8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, ICBDA, 2023, :113-120
[6]   Automatic quality control of weather data for timely decisions in agriculture [J].
Dandrifosse, Sebastien ;
Jago, Alban ;
Huart, Jean Pierre ;
Michaud, Valery ;
Planchon, Viviane ;
Rosillon, Damien .
SMART AGRICULTURAL TECHNOLOGY, 2024, 8
[7]  
De Sayan, 2023, The Semantic Web: ESWC 2023 Satellite Events: Proceedings. Lecture Notes in Computer Science (13998), P59, DOI 10.1007/978-3-031-43458-7_11
[8]  
De Sayan, 2022, Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2022. Lecture Notes in Networks and Systems (480), P335, DOI 10.1007/978-981-19-3089-8_32
[9]   Agriculture 5.0: Cutting-Edge Technologies, Trends, and Challenges [J].
Fountas, Spyros ;
Espejo-Garcia, Borja ;
Kasimati, Aikaterini ;
Gemtou, Marilena ;
Panoutsopoulos, Hercules ;
Anastasiou, Evangelos .
IT PROFESSIONAL, 2024, 26 (01) :40-47
[10]   Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers [J].
Goddard, Jerome .
AMERICAN JOURNAL OF MEDICINE, 2023, 136 (11) :1059-1060