Named Entity Recognition in Hindi Using Hyperspace Analogue to Language and Conditional Random Field

被引:0
作者
Jain, Arti [1 ]
Arora, Anuja [1 ]
机构
[1] Jaypee Inst Informat Technol, CSE&IT, Noida 201309, UP, India
来源
PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY | 2018年 / 26卷 / 04期
关键词
Conditional Random Field; Hindi; Hyperspace Analogue to Language; Named Entity Recognition;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Named Entity Recognition (NER) is defined as identification and classification of Named Entities (NEs) into set of well-defined categories. Many rule-based, machine learning based, and hybrid approaches have been devised to deal with NER, particularly, for the English language. However, in case of Hindi language several perplexing challenges occur that are detailed in this research paper. A new approach is proposed to perform Hindi NE Recognition using semantic properties to handle some of the Hindi language specific NER challenges. And because of increasing demand in Hindi health care applications, Hindi Health Data (HHD) is crawled from four well-known Indian websites: Traditional Knowledge Digital Library; Ministry of Ayush; University of Patanjali; and Linguistic Data Consortium for Indian Languages. Four novel NE types are determined, namely-Person NE, Disease NE, Symptom NE and Consumable NE. For training purpose, HHD data is converted into Hyperspace Analogue to Language (HAL) vectors, thereby, maps each word into a high dimensional space. Conditional Random Field model is applied based on HHD feature engineering, HHD gazetteers and HAL. Blind test data is then mapped into the high dimensional space created during the training phase and outputs the annotated test data. The results obtained are quite significant; and HAL accompanied with CRF approach seems to provide effective outcome for Hindi NE Recognition.
引用
收藏
页码:1801 / 1822
页数:22
相关论文
共 52 条
[1]  
Aggarwal C. C., 2012, MINING TEXT DATA
[2]  
Ahmed I., 2015, INT J DATABASE THEOR, V8, P43
[3]  
[Anonymous], 2015, P WORKSHOP NOISY USE
[4]  
[Anonymous], 2010, P 2010 C EMPIRICAL M
[5]  
[Anonymous], 2002, P 19 INT C COMPUTATI, DOI DOI 10.3115/1072228.1072253
[6]  
Athavale Vinayak, 2016, DEEP LEARNING HINDI
[7]  
Baksa K., 2016, SLOVENSINA 20 EMPIR, V4, P20, DOI [10.4312/slo2.0.2016.1.20-41, DOI 10.4312/SLO2.0.2016.1.20-41]
[8]  
Burgess C, 1997, PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, P61
[9]  
Chinchor N., 1997, P 7 C MESSAGE UNDERS, V29, P1
[10]  
Collins M., 1999, 1999 JOINT SIGDAT C