Bilingual Dictionary Construction with Transliteration Filtering

被引:0
作者
Richardson, John [1 ]
Nakazawa, Toshiaki [2 ]
Kurohashi, Sadao [1 ]
机构
[1] Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan
[2] Japan Sci & Technol Agcy, Chiyoda Ku, Tokyo 1028666, Japan
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
transliteration; lexicon; Katakana;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine translation approach. We demonstrate that the extracted dictionary is accurate and of high recall (F-1-score 0.8). Our lexicon contains not only single words but also multi-word expressions, and is freely available. Our experiments focus on Katakana-English lexicon construction, however it would be possible to apply the proposed methods to transliteration extraction for a variety of language pairs.
引用
收藏
页码:1013 / 1017
页数:5
相关论文
共 9 条
[1]  
[Anonymous], 2007, ACL 2007
[2]  
[Anonymous], THESIS U EDINBURGH
[3]  
Antony PJ, 2010, COMM COM INF SC, V70, P356
[4]  
Brill Eric, 2001, P 6 NAT LANG PROC PA
[5]  
Ganesh Surya, 2008, 2 INT WORKSH CROSS L
[6]  
Knight Kevin, 1998, COMPUTATIONAL LINGUI, P24
[7]  
Noeman Sara, 2010, P 2010 NAM ENT WORKS
[8]  
Och Franz Josef, 2003, COMPUTATIONAL LINGUI
[9]  
Richardson John, 2013, IJCNLP 2013