DeepSpacy-NER: an efficient deep learning model for named entity recognition for Punjabi language

被引:6
作者
Singh, Navdeep [1 ]
Kumar, Munish [2 ]
Singh, Bavalpreet [3 ]
Singh, Jaskaran [3 ]
机构
[1] Punjabi Univ, Dept Comp Sci & Engn, Patiala, Punjab, India
[2] Maharaja Ranjit Singh Punjab Tech Univ, Dept Computat Sci, Bathinda, Punjab, India
[3] Tatras Data Serv Pvt Ltd, Mohali, Punjab, India
关键词
Named-entity-recognition; Spacy; Annotations; Gurmukhi;
D O I
10.1007/s12530-022-09453-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition is a technique for extracting named entities from text and classifying them into various entity types. There has been a lot of research done on the Punjabi language's Shahmukhi script, with less emphasis on the Gurmukhi script. This paper proposes a novel technique for extracting named entities from sentences written in the Punjabi language's Gurmukhi script, which categorizes the entities into six different entity types. 15 k sentences from the Indic data corpus' Punjabi data and various newspapers were used for this work, and they were annotated with Doccano, an open-source annotation tool. In addition, the researchers proposed and made public an annotated benchmark corpus for Gurmukhi script. The model was trained on the Spacy framework with only 12 k sentences selected at random from the Punjabi data corpus, and the results were validated with the remaining 3 k sentences in terms of F1-score, which was chosen as the evaluation metric. The experimental results have been analyzed, and the article contains useful information about the technique.
引用
收藏
页码:673 / 683
页数:11
相关论文
共 35 条
[1]   Named Entity Recognition and Classification for Punjabi Shahmukhi [J].
Ahmad, Muhammad Tayyab ;
Malik, Muhammad Kamran ;
Shahzad, Khurram ;
Aslam, Faisal ;
Iqbal, Asif ;
Nawaz, Zubair ;
Bukhari, Faisal .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (04)
[2]  
Ali W, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P2953
[3]  
[Anonymous], 2013, JMLR WORKSHOP C P
[4]  
[Anonymous], 2018, P 2018 C N AM CHAPTE, DOI DOI 10.18653/V1/N18-1202
[5]  
Athavale V, 2016, Arxiv, DOI arXiv:1610.09756
[6]  
Boden M., 2001, A guide to recurrent neural networks and backpropagation
[7]  
Staudemeyer RC, 2019, Arxiv, DOI arXiv:1909.09586
[8]  
Chatterji S, 2008, LECT N BIOINFORMAT, V4955, P17
[9]   Combining Neural and Knowledge-Based Approaches to Named Entity Recognition in Polish [J].
Dadas, Slawomir .
ARTIFICIAL INTELLIGENCEAND SOFT COMPUTING, PT I, 2019, 11508 :39-50
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171