Re-ranking search results using language models of query-specific clusters

被引:30
作者
Kurland, Oren [1 ]
机构
[1] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Technion, Haifa, Israel
来源
INFORMATION RETRIEVAL | 2009年 / 12卷 / 04期
基金
美国国家科学基金会;
关键词
Query-specific clusters; Cluster-based language models; Cluster-based re-ranking; Cluster-based smoothing; INFORMATION;
D O I
10.1007/s10791-008-9065-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to using single documents to this end.
引用
收藏
页码:437 / 460
页数:24
相关论文
共 53 条
[1]  
[Anonymous], THESIS CORNELL U
[2]  
[Anonymous], P 27 INT ACM SIGIR C
[3]  
[Anonymous], 2003, INFORM RETRIEVAL BOO
[4]  
[Anonymous], P TREC 13
[5]  
[Anonymous], IR338 U MASS CTR INT
[6]  
Azzopardi L, 2004, IEEE IJCNN, P3281
[7]   Re-ranking method based on inter-document distances [J].
Balinski, J ;
Danilowicz, C .
INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (04) :759-775
[8]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[9]  
Buckley Chris., 1994, P TEXT RETRIEVAL C T, P69
[10]  
CONNELL M, 2004, UMASS TDT 2004 TDT20