Character N-grams translation in cross-language information retrieval

被引:0
作者
Vilares, Jesus [1 ]
Oakes, Michael P. [2 ]
Vilares, Manuel [3 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Campus Elvinas S-N, La Coruna 15071, Spain
[2] Univ Sunderland, Sch Comp &Technol, Sunderland SR6 0DD, Durham, England
[3] Univ Vigo, Dept Comp Sci, Orense 32004, Spain
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS | 2007年 / 4592卷
关键词
Cross-Language Information Retrieval; character N-grams; translation algorithms; alignment algorithms; association measures;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. This knowledge-light approach does not rely on language-specific processing, and it can be used with languages of very different natures even when linguistic information and resources are scarce or unavailable. Our proposal also tries to achieve a higher speed during the n-gram alignment process with respect to previous approaches.
引用
收藏
页码:217 / +
页数:2
相关论文
共 11 条