NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [31] Frequency Based Named Entity Recognition System For Under Resource Language
    Debbarma, Abhijit
    Bhattacharya, Paritosh
    Purkayastha, B. S.
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 847 - 849
  • [32] Enhancing Food Ingredient Named-Entity Recognition with Recurrent Network-Based Ensemble (RNE) Model
    Komariah, Kokoy Siti
    Sin, Bong-Kee
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [33] Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognition
    Benajiba, Yassine
    Diab, Mona
    Rosso, Paolo
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2009, 6 (05) : 464 - 472
  • [34] Korean named entity recognition based on language-specific features
    Chen, Yige
    Lim, KyungTae
    Park, Jungyeul
    NATURAL LANGUAGE ENGINEERING, 2024, 30 (03) : 625 - 649
  • [35] Hybrid medical named entity recognition using document structure and surrounding context
    Mohamed Yassine Landolsi
    Lotfi Ben Romdhane
    Lobna Hlaoua
    The Journal of Supercomputing, 2024, 80 : 5011 - 5041
  • [36] Hybrid medical named entity recognition using document structure and surrounding context
    Landolsi, Mohamed Yassine
    Romdhane, Lotfi Ben
    Hlaoua, Lobna
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (04) : 5011 - 5041
  • [37] A Supervised Named Entity Recognition Method Based on Pattern Matching and Semantic Verification
    Gao, Nan
    Zhu, Zhenyang
    Weng, Zhengqiu
    Chen, Guolang
    Zhang, Min
    JOURNAL OF INTERNET TECHNOLOGY, 2020, 21 (07): : 1917 - 1928
  • [38] CLESR: Context-Based Label Knowledge Enhanced Span Recognition for Named Entity Recognition
    Chen, Xi
    Zhang, Wei
    Pan, Shuai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [39] Named Entity Recognition of Zhuang Language Based on the Feature of Initial Letter in Word
    Zhang, Weiquan
    Tang, Suqin
    He, Danni
    Li, Tinghui
    Pan, Changchun
    6TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE, ICIAI2022, 2022, : 44 - 49
  • [40] LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition
    Sharma, Rishab
    Chen, Fuxiang
    Fard, Fatemeh
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 48 - 59