NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [1] Maximum Entropy Named Entity Recognition for Czech Language
    Konkol, Michal
    Konopik, Miloslav
    TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 203 - 210
  • [2] Named entity recognition for Hindi language : A survey
    Sharma, Richa
    Morwal, Sudha
    Agarwal, Basant
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2019, 22 (04) : 569 - 580
  • [3] Hindi named entity recognition using system combination
    Sarkar, Kamal
    INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2018, 5 (01) : 11 - 39
  • [4] A probabilistic feature based Maximum Entropy model for Chinese named entity recognition
    Zhang, Suxiang
    Wang, Xiaojie
    Wen, Juan
    Qin, Ying
    Zhong, Yixin
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 189 - +
  • [5] Curatable Named-Entity Recognition Using Semantic Relations
    Hsu, Yi-Yu
    Kao, Hung-Yu
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (04) : 785 - 792
  • [6] Named Entity Recognition in Hindi Using Hyperspace Analogue to Language and Conditional Random Field
    Jain, Arti
    Arora, Anuja
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2018, 26 (04): : 1801 - 1822
  • [7] Improving feature extraction in named entity recognition based on maximum entropy model
    Jiang, Wei
    Guan, Yi
    Wang, Xiao-Long
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2630 - +
  • [8] Thai Named-Entity Recognition Using Class-based Language Modeling on Multiple-sized Subword Units
    Saykhum, Kwanchiva
    Boonpiam, Vataya
    Thatphithakkul, Nattanun
    Wutiwiwatchai, Chai
    Natthee, Cholwich
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1586 - +
  • [9] GoalBERT: A Lightweight Named-Entity Recognition Model Based on Multiple Fusion
    Xu, Yingjie
    Tan, Xiaobo
    Wang, Mengxuan
    Zhang, Wenbo
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [10] Ontology Extraction from Software Requirements Using Named-Entity Recognition
    Kocerka, Jerzy
    Krzeslak, Michal
    Galuszka, Adam
    ADVANCES IN SCIENCE AND TECHNOLOGY-RESEARCH JOURNAL, 2022, 16 (03) : 207 - 212