Selection of Japanese-English Equivalents by Integrating High-quality Corpora and Huge Amounts of Web Data

被引:0
作者
Ma, Qing [1 ,2 ]
Koichi, Nakao [1 ]
Murata, Masaki [2 ]
Isahara, Hitoshi [2 ]
机构
[1] Ryukoku Univ, Otsu, Shiga 5202194, Japan
[2] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
来源
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008 | 2008年
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
As a first step to developing systems that enable non-native speakers to output near-perfect English sentences for given mixed English-Japanese sentences, we propose new approaches for selecting English equivalents by using the number of hits for various contexts in large English corpora. As the large English corpora, we not only used the huge amounts of Web data but also the manually compiled large, high-quality English corpora. Using high-quality corpora enables us to accurately select equivalents, and using huge amounts of Web data enables us to resolve the problem of the shortage of hits that normally occurs when using only high-quality corpora. The types and lengths of contexts used to select equivalents are variable and optimally determined according to the number of hits in the corpora, so that performance can be further refined. Computer experiments showed that the precision of our methods was much higher than that of the existing methods for equivalent selection.
引用
收藏
页码:416 / 421
页数:6
相关论文
共 15 条
[1]  
Ballesteros L., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P64, DOI 10.1145/290941.290958
[2]  
Fujii A., 2000, Transactions of the Information Processing Society of Japan, V41, P1038
[3]  
Fung Pascale, 1998, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics, V1, P414
[4]  
Ichihara K., 2005, 11 ANN M ASS NAT LAN, P4
[5]  
Matsumoto Y., 2000, JAPANESE MORPHOLOGIC
[6]  
Nagata R., 2006, IEICE T INF SYST, VJ89-D, P1777
[7]  
Oshika H., 2005, DEWS2005
[8]  
Sato M., 2006, 12 ANN M ASS NAT LAN, P664
[9]  
Sumita E., 2004, TL200422WIT200456 IC, V104, P17
[10]  
TSURUOKA Y ., 2005, P 9 INT WORKSH PARS, P133