Prediction of performance of cross-language information retrieval using automatic evaluation of translation

被引:9
作者
Kishida, Kazuaki [1 ]
机构
[1] Keio Univ, Sch Lib & Informat Sci, Minato Ku, Tokyo 1088345, Japan
关键词
D O I
10.1016/j.lisr.2007.09.003
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
This study develops regression models for predicting the performance of cross-language information retrieval (CUR). The model assumes that CUR performance can be explained by two factors: (1) the ease of search inherent in each query and (2) the translation quality in the process of CLIR systems. As operational variables, monolingual information retrieval (IR) performance is used for measuring the ease of search, and the well-known evaluation metric BLEU is used to measure the translation quality. This study also proposes an alternative metric, weighted average for matched unigrams (WAMU), which is tailored to gauging translation quality for special IR purposes. The data for regression analysis are obtained from a retrieval experiment of English-to-Italian bilingual searches using the CLEF 2003 test collection. The CUR and monolingual IR performances are measured by average precision score. The result shows that the proposed regression model can explain about 60% of the variation in CLIR performance, and WAMU has more predictive power than BLEU. A back translation method for applying the regression model to operational CUR systems in real situations is discussed. (C) 2008 Elsevier Inc. All rights reserved.
引用
收藏
页码:138 / 144
页数:7
相关论文
共 22 条
[1]  
Ballesteros L, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P84, DOI 10.1145/278459.258540
[2]  
Ballesteros L., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P64, DOI 10.1145/290941.290958
[3]  
Braschler M., 2004, Comparative Evaluation of Multilingual Information Access Systems. 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003. Revised Papers (Lecture Notes in Computer Science Vol.3237), P7
[4]  
BUCKLEY C, 1998, P TREC 6 GAITH MD NA
[5]  
DAVIS M, 1997, P TREC 5 GAITH MD NA
[6]  
Doddington G., 2002, P 2 INT C HUMAN LANG, P138
[7]  
Finch A., 2005, P 3 INT WORKSH PAR I, P17
[8]   Technical issues of cross-language information retrieval: a review [J].
Kishida, K .
INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (03) :433-455
[9]  
KISHIDA K, 2004, P NTCIR 4 TOK NAT I
[10]   Term disambiguation techniques based on target document collection for cross-language information retrieval: An empirical comparison of performance between techniques [J].
Kishida, Kazuaki .
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (01) :103-120