Agricultural large language model for standardized production of distinctive agricultural products

被引:0
作者
Yi, Wenlong [1 ]
Zhang, Li [1 ]
Kuzmin, Sergey [2 ]
Gerasimov, Igor [2 ]
Liu, Muhua [3 ]
机构
[1] Jiangxi Agr Univ, Sch Software, Nanchang 330045, Peoples R China
[2] St Petersburg Electrotech Univ LETI, Fac Comp Sci & Technol, St Petersburg 197022, Russia
[3] Jiangxi Agr Univ, Sch Engn, Nanchang 330045, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Agricultural products; Standardization; Knowledge engineering; Large models; Retrieval augmentation;
D O I
10.1016/j.compag.2025.110218
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
To address the diverse nature of specialty agricultural product standardization, its complex and cumbersome development process, and lengthy drafting cycles, while simultaneously tackling challenges such as outdated standardization documents and hallucinations caused by general large language models' delayed access to agricultural domain information. This study constructs a multi-stage cascaded large language model based on a hybrid retrieval-augmented mechanism. The model comprises three core modules: (1) A multi-source retrieval augmentation module that achieves comprehensive external knowledge acquisition through vector retrieval, keyword retrieval, and knowledge graph retrieval branches; (2) A knowledge fusion module that filters redundant information using inverse ranking fusion and graph structure pruning methods to achieve precise injection of high-quality knowledge; (3) A domain adaptation module that enhances the model's understanding of agricultural terminology through vertical domain fine-tuning. Experimental results show that in the standardization document summarization task, the model achieves chrF, BERTscore, and Gscore metrics of 34.85, 74.88, and 39.85, respectively, representing improvements of 59.52%, 35.28%, and 72.84% over the BART baseline model, and 58.54%, 24.25%, and 59.54% over the T5 model. This study enriches the theoretical foundation of large language models in agriculture and provides intelligent technical support for specialty agricultural product standardization development.
引用
收藏
页数:15
相关论文
共 47 条
  • [1] Austin J, 2021, ADV NEUR IN
  • [2] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
  • [3] A Comparative Study of Rank Aggregation Methods in Recommendation Systems
    Balchanowski, Michal
    Boryczka, Urszula
    [J]. ENTROPY, 2023, 25 (01)
  • [4] Barbella M., 2022, Rouge metric evaluation for text summarization techniques, DOI [10.2139/ssrn.4120317, DOI 10.2139/SSRN.4120317]
  • [5] Brown TB, 2020, ADV NEUR IN, V33
  • [6] International Consensus on Standardized Clinic Blood Pressure Measurement-A Call to Action
    Cheung, Alfred K.
    Whelton, Paul K.
    Muntner, Paul
    Schutte, Aletta E.
    Moran, Andrew E.
    Williams, Bryan
    Sarafidis, Pantelis
    Chang, Tara I.
    Daskalopoulou, Stella S.
    Flack, John M.
    Jennings, Garry
    Juraschek, Stephen P.
    Kreutz, Reinhold
    Mancia, Giuseppe
    Nesbitt, Shawna
    Ordunez, Pedro
    Padwal, Raj
    Persu, Alexandre
    Rabi, Doreen
    Schlaich, Markus P.
    Stergiou, George S.
    Tobe, Sheldon W.
    Tomaszewski, Maciej
    Williams Sr, Kim A.
    Mann, Johannes F. E.
    [J]. AMERICAN JOURNAL OF MEDICINE, 2023, 136 (05)
  • [7] Reciprocal Rank Fusion outperforms Condorcet and Individual Rank Learning Methods
    Cormack, Gordon V.
    Clarke, Charles L. A.
    Buettcher, Stefan
    [J]. PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 758 - 759
  • [8] Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score
    Datta, Goutam
    Joshi, Nisheeth
    Gupta, Kusum
    [J]. SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 155 - 162
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Ding N, 2023, 2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, P4133