Character N-grams translation in cross-language information retrieval

被引：0

作者：

Vilares, Jesus ^{[1
]}

Oakes, Michael P. ^{[2
]}

Vilares, Manuel ^{[3
]}

机构：

[1] Univ A Coruna, Dept Comp Sci, Campus Elvinas S-N, La Coruna 15071, Spain

[2] Univ Sunderland, Sch Comp &Technol, Sunderland SR6 0DD, Durham, England

[3] Univ Vigo, Dept Comp Sci, Orense 32004, Spain

来源：

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS | 2007年 / 4592卷

关键词：

Cross-Language Information Retrieval; character N-grams; translation algorithms; alignment algorithms; association measures;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes a new technique for the direct translation of character n-grams for use in Cross-Language Information Retrieval systems. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. This knowledge-light approach does not rely on language-specific processing, and it can be used with languages of very different natures even when linguistic information and resources are scarce or unavailable. Our proposal also tries to achieve a higher speed during the n-gram alignment process with respect to previous approaches.

引用

页码：217 / +

页数：2

共 11 条

[1] Probabilistic models of information retrieval based on measuring the divergence from randomness [J].

Amati, G ;

Van Rijsbergen, CJ .

ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2002, 20 (04) :357-389

[2]

Koehn P., 2003, Proceedings of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, P48

[3]

Koehn Philipp, 2005, MT SUMMIT, V2005, P79

[4]

Manning C., 1999, Foundations of Statistical Natural Language Processing

[5]

McNamee P, 2003, LECT NOTES COMPUT SC, V3237, P85

[6] Character N-gram tokenization for European language text retrieval [J].

McNamee, P ;

Mayfield, J .

INFORMATION RETRIEVAL, 2004, 7 (1-2) :73-97

[7]

NARDI A, 2006, WORKING NOTES CLEF 2

[8]

OCH FJ, 2003, SYSTEMATIC COMP VARI

[9] AN ALGORITHM FOR SUFFIX STRIPPING [J].

PORTER, MF .

PROGRAM-AUTOMATED LIBRARY AND INFORMATION SYSTEMS, 1980, 14 (03) :130-137

[10] Cross-language information retrieval: experiments based on CLEF 2000 corpora [J].

Savoy, J .

INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :75-115

← 1 2 →