A comparison of collocation-based similarity measures in query expansion

被引:47
作者
Kim, MC
Choi, KS
机构
[1] SungKongHoe Univ, Dept Comp & Informat Sci, Kuro Ku, Seoul 152716, South Korea
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Yusung Ku, Taejon 305701, South Korea
关键词
A comparison of collocation-based similarity measures in query expansion;
D O I
10.1016/S0306-4573(98)00040-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures. average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information, All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. In our experiments of query expansion using these five similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval effectiveness, whereas, NMI and ACP are better in terms of execution efficiency. (C) 1999 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 19 条
[1]  
[Anonymous], 19 ANN INT ACM SIGIR
[2]  
[Anonymous], J DOCUMENTATION
[3]  
Frakes W. B., 1992, INFORMATION RETRIEVA, P168
[4]   Two-level document ranking using mutual information in natural language information retrieval [J].
Kang, HK ;
Choi, KS .
INFORMATION PROCESSING & MANAGEMENT, 1997, 33 (03) :289-306
[5]  
KIM SH, 1994, J KOREAN SOC INFORMA, V11, P81
[6]  
KWOK KL, 1997, 20 ACM SIGIR INT C R, P34
[7]  
KWON OW, 1994, 3 INT C INF KNOWL MA
[8]   INFORMATION-RETRIEVAL BASED ON FUZZY ASSOCIATIONS [J].
MIYAMOTO, S .
FUZZY SETS AND SYSTEMS, 1990, 38 (02) :191-205
[9]   A FUZZY DOCUMENT-RETRIEVAL SYSTEM USING THE KEYWORD CONNECTION MATRIX AND A LEARNING-METHOD [J].
OGAWA, Y ;
MORITA, T ;
KOBAYASHI, K .
FUZZY SETS AND SYSTEMS, 1991, 39 (02) :163-179
[10]   Automatic thesaurus construction using Bayesian networks [J].
Park, YC ;
Choi, KS .
INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (05) :543-553