NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引:2
|
作者
Jain, Arti [1 ]
Yadav, Divakar [2 ]
Arora, Anuja [1 ]
Tayal, Devendra K. [3 ]
机构
[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India
[2] NIT Hamirpur, Hamirpur, Himachal Prades, India
[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India
来源
COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期
关键词
context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;
D O I
10.7494/csci.2022.23.1.3977
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).
引用
收藏
页码:81 / 115
页数:35
相关论文
共 50 条
  • [21] Using Search Session Context for Named Entity Recognition in Query
    Du, Junwu
    Zhang, Zhimin
    Yan, Jun
    Cui, Yan
    Chen, Zheng
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 765 - 766
  • [22] Named-Entity Recognition in Sports Field Based on a Character-Level Graph Convolutional Network
    Seti, Xieraili
    Wumaier, Aishan
    Yibulayin, Turgen
    Paerhati, Diliyaer
    Wang, Lulu
    Saimaiti, Alimu
    INFORMATION, 2020, 11 (01)
  • [23] Named Entity Recognition for Malayalam Language: A CRF based Approach
    Prasad, Gowri
    Fousiya, K. K.
    Kumar, M. Anand
    Soman, K. P.
    2015 INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND MANAGEMENT FOR COMPUTING, COMMUNICATION, CONTROLS, ENERGY AND MATERIALS (ICSTM), 2015, : 16 - 19
  • [24] CRF-Based Named Entity Recognition for Myanmar Language
    Mo, Hsu Myat
    Nwet, Khin Thandar
    Soe, Khin Mar
    GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 204 - 211
  • [25] Named-Entity Recognition Using Automatic Construction of Training Data From Social Media Messaging Apps
    Lee, Seungwook
    Ko, Youngjoong
    IEEE ACCESS, 2020, 8 : 222724 - 222732
  • [26] Development of a Hindi Named Entity Recognition System without Using Manually Annotated Training Corpus
    Saha, Sujan Kumar
    Majumder, Mukta
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (06) : 1088 - 1098
  • [27] Improving Norwegian Translation of Bicycle Terminology Using Custom Named-Entity Recognition and Neural Machine Translation
    Hellebust, Daniel
    Lawal, Isah A.
    ELECTRONICS, 2023, 12 (10)
  • [28] Named Entity Recognition and Classification using Context Hidden Markov Model
    Todorovic, Branimir T.
    Rancic, Svetozar R.
    Markovic, Ivica M.
    Mulalic, Edin H.
    Ilic, Velimir M.
    NEUREL 2008: NINTH SYMPOSIUM ON NEURAL NETWORK APPLICATIONS IN ELECTRICAL ENGINEERING, PROCEEDINGS, 2008, : 41 - +
  • [29] Medical Named Entity Recognition for Indonesian Language Using Word Representations
    Rahman, Arief
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND DIGITAL APPLICATIONS (ICITDA 2017), 2018, 325
  • [30] Named entity recognition in Odia language: a rule-based approach
    Anandika A.
    Chakravarty S.
    Paikaray B.K.
    International Journal of Reasoning-based Intelligent Systems, 2023, 15 (01) : 15 - 21