Named Entity Recognition and transliteration in Bengali

被引:1
作者
Ekbal, Asif [1 ]
Naskar, Sudip Kumar [1 ]
Bandyopadhyay, Sivaji [1 ]
机构
[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, W Bengal, India
来源
LINGUISTICAE INVESTIGATIONES | 2007年 / 30卷 / 01期
关键词
named entity recognition; HMM based approach; named entity transliteration; Modified Joint Source-Channel Model; Bengali;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The paper reports about the development of a Named Entity Recognition (NER) system in Bengali using a tagged Bengali news corpus and the subsequent transliteration of the recognized Bengali Named Entities (NEs) into English. Three different models of the NER have been developed. A semi-supervised learning method has been adopted to develop the first two models, one without linguistic features (Model A) and the other with linguistic features (Model B). The third one (Model C) is based on statistical Hidden Markov Model. A modified joint-source channel model has been used along with a number of alternatives to generate the English transliterations of Bengali NEs and vice-versa. The transliteration models learn the mappings from the bilingual training sets optionally guided by linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their representations in English. The NER system has demonstrated the highest average Recall, Precision and F-Score values of 89.62%, 78.67% and 83.79% respectively in Model C. Evaluation of the proposed transliteration models demonstrated that the modified joint source-channel model performs best in terms of evaluation metrics for person and location names for both Bengali to English (B2E) transliteration and English to Bengali transliteration (E2B). The use of the linguistic knowledge during training of the transliteration models improves performance.
引用
收藏
页码:95 / 114
页数:20
相关论文
共 35 条
[1]  
Abdul Jaleel N., 2003, P C INF KNOWL MAN P C INF KNOWL MAN, P139
[2]  
Al-Onaizan Y, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P400
[3]  
ALONAIZAN Y, 2002, P ACL 02 WORKSH COMP, P1, DOI DOI 10.3115/1118637.1118642
[4]  
AONE C, 1998, SRA DESCRIPTION IE2
[5]  
Babych B., 2003, P 7 INT EAMT WORKSH, P1, DOI DOI 10.3115/1609822.1609823
[6]  
Bennett S. W., 1997, LEARNING TAG MULTILI, P109
[7]  
Bikel Daniel M, 1999, MACH LEARN, P1
[8]  
Borthwick A., 1998, NYU DESCRIPTION MENE
[9]  
Bothwick Andrew, 1999, THESIS
[10]  
Brants T, 2000, 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, P224