Character N-grams translation in cross-language information retrieval

被引:0
作者
Vilares, Jesus [1 ]
Oakes, Michael P. [2 ]
Vilares, Manuel [3 ]
机构
[1] Univ A Coruna, Dept Comp Sci, Campus Elvinas S-N, La Coruna 15071, Spain
[2] Univ Sunderland, Sch Comp &Technol, Sunderland SR6 0DD, Durham, England
[3] Univ Vigo, Dept Comp Sci, Orense 32004, Spain
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS | 2007年 / 4592卷
关键词
Cross-Language Information Retrieval; character N-grams; translation algorithms; alignment algorithms; association measures;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. This knowledge-light approach does not rely on language-specific processing, and it can be used with languages of very different natures even when linguistic information and resources are scarce or unavailable. Our proposal also tries to achieve a higher speed during the n-gram alignment process with respect to previous approaches.
引用
收藏
页码:217 / +
页数:2
相关论文
共 11 条
[1]   Probabilistic models of information retrieval based on measuring the divergence from randomness [J].
Amati, G ;
Van Rijsbergen, CJ .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2002, 20 (04) :357-389
[2]  
Koehn P., 2003, Proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, P48
[3]  
Koehn Philipp, 2005, MT SUMMIT, V2005, P79
[4]  
Manning C., 1999, Foundations of Statistical Natural Language Processing
[5]  
McNamee P, 2003, LECT NOTES COMPUT SC, V3237, P85
[6]   Character N-gram tokenization for European language text retrieval [J].
McNamee, P ;
Mayfield, J .
INFORMATION RETRIEVAL, 2004, 7 (1-2) :73-97
[7]  
NARDI A, 2006, WORKING NOTES CLEF 2
[8]  
OCH FJ, 2003, SYSTEMATIC COMP VARI
[9]   AN ALGORITHM FOR SUFFIX STRIPPING [J].
PORTER, MF .
PROGRAM-AUTOMATED LIBRARY AND INFORMATION SYSTEMS, 1980, 14 (03) :130-137
[10]   Cross-language information retrieval: experiments based on CLEF 2000 corpora [J].
Savoy, J .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :75-115