NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY

被引：2

作者：

Jain, Arti ^{[1
]}

Yadav, Divakar ^{[2
]}

Arora, Anuja ^{[1
]}

Tayal, Devendra K. ^{[3
]}

机构：

[1] Jaypee Inst Informat Technol, Noida, Uttar Pradesh, India

[2] NIT Hamirpur, Hamirpur, Himachal Prades, India

[3] Indira Gandhi Delhi Tech Univ Women, New Delhi, India

来源：

COMPUTER SCIENCE-AGH | 2022年 / 23卷 / 01期

关键词：

context patterns; gazetteer lists; Hindi language; Kaggle dataset; maximum entropy; named-entity recognition; feature extension; HYBRID APPROACH; SYSTEM;

D O I：

10.7494/csci.2022.23.1.3977

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE's features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python (C) code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).

引用

页码：81 / 115

页数：35

共 50 条

[41] Recent Progress on Named Entity Recognition Based on Pre-trained Language Models
Yang, Binxia
Luo, Xudong
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 799 - 804
[42] Research on Named Entity Recognition for Spoken Language Understanding Using Adversarial Transfer Learning
Guo, Yao
Li, Meng
Li, Yanling
Ge, Fengpei
Qi, Yaohui
Lin, Min
ELECTRONICS, 2023, 12 (04)
[43] Enhanced neurologic concept recognition using a named entity recognition model based on transformers
Azizi, Sima
Hier, Daniel B.
Wunsch II, Donald C. C.
FRONTIERS IN DIGITAL HEALTH, 2022, 4
[44] Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations
Zhang, Min
Geng, Guohua
Chen, Jing
ENTROPY, 2020, 22 (02)
[45] Transfer Learning for Named Entity Recognition in Setswana Language Using CNN-BiLSTM Model
Chabalala, Shumile
Ojo, Sunday O.
Owolawi, Pius A.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 472 - 481
[46] Chinese Named Entity Recognition using a Morpheme-based Chunking Tagger
Fu, Guohong
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 289 - 292
[47] Simultaneous Character-Cluster-Based Word Segmentation and Named Entity Recognition in Thai Language
Tongtep, Nattapong
Theeramunkong, Thanaruk
KNOWLEDGE, INFORMATION, AND CREATIVITY SUPPORT SYSTEMS, 2011, 6746 : 216 - 225
[48] Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
Wang, Peng
Yang, Yifan
Bang, Zheng
Tan, Tian
Zhang, Shiliang
Chen, Xie
INTERSPEECH 2024, 2024, : 742 - 746
[49] Text Summarization based Named Entity Recognition for Certain Application using BERT
Tummala, Indira Priyadarshini
2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1136 - 1141
[50] Sentence-based undersampling for named entity recognition using genetic algorithm
Abbas Akkasi
Iran Journal of Computer Science, 2018, 1 (3) : 165 - 174

← 1 2 3 4 5 →