Named Entity Recognition and Classification for Punjabi Shahmukhi

被引:14
|
作者
Ahmad, Muhammad Tayyab [1 ,2 ]
Malik, Muhammad Kamran [1 ,2 ]
Shahzad, Khurram [1 ,2 ]
Aslam, Faisal [1 ,2 ]
Iqbal, Asif [1 ,2 ]
Nawaz, Zubair [1 ,2 ]
Bukhari, Faisal [1 ,2 ]
机构
[1] Punjab Univ Coll Informat Technol, Lahore, Pakistan
[2] Univ Punjab, Punjab Univ Coll Informat Technol, New Campus, Lahore, Pakistan
关键词
Low-resource languages; Asian languages; Punjabi; Shahmukhi; named entity recognition;
D O I
10.1145/3383306
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition (NER) refers to the identification of proper nouns from natural language text and classifying them into named entity types, such as person, location, and organization. Due to the widespread applications of NER, numerous NER techniques and benchmark datasets have been developed for bothWestern and Asian languages. Even though Shahmukhi script of the Punjabi language has been used by nearly three fourths of the Punjabi speakers worldwide, Gurmukhi has been the main focus of research activities. Specifically, a benchmark NER corpus for Shahmukhi is non-existent, which has thwarted the commencement of NER research for the Shahmukhi script. To this end, this article presents the development and specifications of the first-ever NER corpus for Shahmukhi. The newly developed corpus is composed of 318,275 tokens and 16,300 named entities, including 11,147 persons, 3,140 locations, and 2,013 organizations. To establish the strength of our corpus, we have compared the specifications of our corpus with its Gurmukhi counterparts. Furthermore, we have demonstrated the usability of our corpus using five supervised learning techniques, including two state-of-the-art deep learning techniques. The results are compared, and valuable insights about the behaviors of the most effective technique are discussed.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Named Entity Recognition and Classification using Context Hidden Markov Model
    Todorovic, Branimir T.
    Rancic, Svetozar R.
    Markovic, Ivica M.
    Mulalic, Edin H.
    Ilic, Velimir M.
    NEUREL 2008: NINTH SYMPOSIUM ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING, PROCEEDINGS, 2008, : 41 - +
  • [42] Boundary Enhanced Neural Span Classification for Nested Named Entity Recognition
    Tan, Chuanqi
    Qiu, Wei
    Chen, Mosha
    Wang, Rui
    Huang, Fei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9016 - 9023
  • [43] Enhancing biomedical named entity recognition with parallel boundary detection and category classification
    Wang, Yu
    Tong, Hanghang
    Zhu, Ziye
    Hou, Fengzhen
    Li, Yun
    BMC BIOINFORMATICS, 2025, 26 (01):
  • [44] Joint Learning of Named Entity Recognition and Entity Linking
    Martins, Pedro Henrique
    Marinho, Zita
    Martins, Andre F. T.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 190 - 196
  • [45] Does semantics aid syntax? An empirical study on named entity recognition and classification
    Xiaoshi Zhong
    Erik Cambria
    Amir Hussain
    Neural Computing and Applications, 2022, 34 : 8373 - 8384
  • [46] An Association Rule Mining Method Based on Named Entity Recognition and Text Classification
    He, Bo
    Zhang, Jiru
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) : 1503 - 1511
  • [47] An Association Rule Mining Method Based on Named Entity Recognition and Text Classification
    Bo He
    Jiru Zhang
    Arabian Journal for Science and Engineering, 2023, 48 : 1503 - 1511
  • [48] Named Entity Recognition Utilized to Enhance Text Classification While Preserving Privacy
    Kutbi, Mohammed
    IEEE ACCESS, 2023, 11 : 117576 - 117581
  • [49] Urdu Named Entity Recognition and Classification System Using Artificial Neural Network
    Malik, Muhammad Kamran
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 17 (01)
  • [50] Does semantics aid syntax? An empirical study on named entity recognition and classification
    Zhong, Xiaoshi
    Cambria, Erik
    Hussain, Amir
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (11): : 8373 - 8384