Named Entity Recognition in Chinese Rice Breeding Questions Based on Text Data Augmentation

被引:0
|
作者
Niu, Peiyu [1 ]
Hou, Chen [2 ,3 ]
机构
[1] College of Information and Electrical Engineering, China Agricultural University, Beijing
[2] National Engineering Laboratory for Big Data Analysis and Applications, Peking University, Beijing
[3] PKU-Changsha Institute for Computing and Digital Economy, Changsha
来源
Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery | 2024年 / 55卷 / 08期
关键词
knowledge graph; named entity recognition; question answering system; rice breeding; text data augmentation;
D O I
10.6041/j.issn.1000-1298.2024.08.030
中图分类号
学科分类号
摘要
Issues of low-level data management and high knowledge granularity exist in current rice breeding question answering systems. In addition, there is a lack of publicly available labeled data for named entity recognition in rice breeding, and manual annotation can be costly. To address these issues, an approach based on text data augmentation to the named entity recognition was proposed for rice breeding questions. The rice breeding knowledge graph was created to assist in subdividing larger named entity categories in rice breeding, such as rice characteristics entities, into smaller subcategories, such as resistance to abiotic stress and eating quality. It helped to enhance entity boundaries and reduce knowledge granularity. Responding to the challenge of high annotation costs for rice breeding data that results in suboptimal performance in named entity recognition, the DA — BERT — BILSTM — CRF model was presented by introducing a data augmentation layer into the BERT — BILSTM — CRF model. Using manually labeled rice breeding questions as training data, the proposed model was compared with three other baseline models. In the overall named entity recognition experiment under the small class entity division, the model achieved a precision of 93. 86%, a recall of 92. 82%, and an Fl score of 93. 34% . Compared with the best-performing BERT — BILSTM — CRF model among the three baseline models, the model outperformed by 4.98, 5.3 and 5. 15 percentages points, respectively. Meanwhile, it also performed better in the single-entity recognition metric, achieving a precision of 94. 26% and an Fl score of 93. 32% . The experiments showed that the proposed approach performed better in both overall named entity recognition and single-class named entity recognition tasks in rice breeding questions. © 2024 Chinese Society of Agricultural Machinery. All rights reserved.
引用
收藏
页码:333 / 343
页数:10
相关论文
共 34 条
  • [1] BAI Shiwei, YU Hong, WANG Bing, Et al., Retrospective and perspective of rice breeding in China[J], Journal of Genetics and Genomics, 45, 11, pp. 603-612, (2018)
  • [2] LI Jiming, LUO Xiaohe, ZHOU Kunlu, Research and development of hybrid rice in Ghina[J], Plant Breeding, 1, 1, pp. 1-9, (2023)
  • [3] LIANG Jingdong, GUI Bingjian, JIANG Haiyan, Et al., Sentence similarity computing based on Word2vec and LSTM and its application in rice FAQ question-answering system[J], Journal of Nanjing Agricultural University, 41, 5, pp. 946-953, (2018)
  • [4] WANG Haoriqin, WU Huarui, ZHU Huaji, Et al., Aresidual LSTM and Seq2Seq neural network based on GPT for Ghinese rice-related question and answer system [J], Agriculture, 12, 6, (2022)
  • [5] TAO Xingxing, WU Yahui, FU Weiwei, Et al., Design and development of rice breeding information data management system [J], China Seed Industry, 1, 6, pp. 4-7, (2019)
  • [6] LAI Yingxu, LI Yajuan, LIU Jing, Construction of ontology-based rice breeding method knowledge base[J], Journal of Beijing University of Technology, 45, 12, pp. 1181-1191, (2019)
  • [7] YUAN Peisen, LI Runlong, WANG Ghong, Et al., Entity relationship extraction from rice phenotype knowledge graph based on BERT[J], Transactions of the Ghinese Society for Agricultural Machinery, 52, 5, pp. 151-158, (2021)
  • [8] HOU Chen, NIU Peiyu, Review of research status and prospects of agricultural knowledge graphs [ J ], Transactions of the Chinese Society for Agricultural Machinery, 55, 6, pp. 1-17, (2024)
  • [9] STEINER T, VERBORGH R, TRONGY R, Et al., Adding realtime coverage to the Google knowledge graph [ G ], llth International Semantic Web Conference, pp. 65-68, (2012)
  • [10] YAN Jihong, WANG Chengyu, CHENG Wenliang, Et al., A retrospective of knowledge graphs [ J ], Frontiers of Computer Science, 12, 1, pp. 55-74, (2018)