AGRONER: An unsupervised agriculture named entity recognition using weighted distributional semantic model

被引:16
作者
Veena, G. [1 ]
Kanjirangat, Vani [2 ]
Gupta, Deepa [3 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Comp Sci & Applicat, Amritapuri, India
[2] Ist Dalle Molle Studi Intelligenza Artificiale USI, Lugano, Switzerland
[3] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Comp Sci & Engn, Bengaluru, India
关键词
Unsupervised approach; Named entity recognition; BERT; Agriculture; Topic modeling; LDA; EXTRACTION; LEVEL;
D O I
10.1016/j.eswa.2023.120440
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a novel weighted distributional semantic model for unsupervised Named Entity Recognition (NER) in domain specific texts, specifically focusing on agricultural domain. Developing accurate agriculture NER models requires overcoming several challenges, including the lack of annotated data, domain-specific vocabulary, entity ambiguity, and contextual variation. The proposed approach is completely unsupervised and utilizes an extended BERT model with LDA topic modeling (exBERT _LDA+) for NER. The proposed Agricultural Named Entity Recognition (AGRONER) model, focuses on identifying six major entities, disease, soil, pathogen, pesticide, crops, and place. The existing four entities are recognized using the proposed algorithm while we utilize the AGROVOC dictionary for crops and Geocoding APIs for Place entities. Due to the absence of a benchmark dataset in the agriculture domain, we created a corpus of 30,000 sentences extracted from recognized agriculture sites. For the evaluation, we used a test corpus with 700 sentences that include 1690 entity names. The labeled entities were then manually checked to evaluate the prediction accuracy. The proposed approach presents a macro average F-measure of 80.43%, which is quite promising for an unsupervised domain specific entity labeling. We performed ablations studies, where the proposed model exhibited a relative percentage improvement of 31.56%, 26.11% F-measure when compared to BERT without LDA (BERT _LDA-) and extended BERT without LDA (exBERT _LDA-)models, respectively. Experimental results show the efficacy of the proposed approach in labeling the named entities in an unsupervised set-up for the agricultural domain. Further, the approach can be easily extended to recognize more domain-specific entities.1
引用
收藏
页数:20
相关论文
共 92 条
[1]  
Abinaya N., 2014, Proc. ACM Int. Conf. Ser., V05-07-Dec, DOI 10.1145/2824864.2824882
[2]   Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques [J].
Afan, Haitham Abdulmohsin ;
Osman, Ahmedbahaaaldin Ibrahem Ahmed ;
Essam, Yusuf ;
Ahmed, Ali Najah ;
Huang, Yuk Feng ;
Kisi, Ozgur ;
Sherif, Mohsen ;
Sefelnasr, Ahmed ;
Chau, Kwok-wing ;
El-Shafie, Ahmed .
ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS, 2021, 15 (01) :1420-1439
[3]   Social networks influence farming practices and agrarian sustainability [J].
Albizua, Amaia ;
Bennett, Elena M. ;
Larocque, Guillaume ;
Krause, Robert W. ;
Pascual, Unai .
PLOS ONE, 2021, 16 (01)
[4]  
Angeli G, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P344
[5]  
Apache, 2016, AP OPENNLP
[6]  
Athiwaratkun B, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1
[7]  
Banko M, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2670
[8]  
Bird Steven, 2016, P ACL 02 WORKSHOP EF, V1
[9]  
Black William J., 1995, P 7 MESSAGE UNDERSTA
[10]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022