Reranking web search results for diversity

被引:0
作者
Ralf Krestel
Peter Fankhauser
机构
[1] L3S Research Center - Leibniz Universität Hannover,
[2] DFKI - German Research Center for Artificial Intelligence,undefined
来源
Information Retrieval | 2012年 / 15卷
关键词
Diversity; Web search; Reranking; Language models; Topic models; Variance; Diversity evaluation; Wikipedia;
D O I
暂无
中图分类号
学科分类号
摘要
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.
引用
收藏
页码:458 / 477
页数:19
相关论文
共 15 条
[1]  
Blei D. M.(2003)Latent Dirichlet allocation Journal of Machine Learning Research 3 993-1022
[2]  
Ng A. Y.(2004)Finding scientific topics Proc Natl Acad Sci U S A 101 5228-5235
[3]  
Jordan M. I.(2011)Result diversification based on query-specific cluster ranking Journal of the American Society for Information Science and Technology 62 550-571
[4]  
Griffiths T. L.(2002)Cumulated gain-based evaluation of IR techniques ACM Trans. Inf. Syst. 20 422-446
[5]  
Steyvers M.(1952)Portfolio selection The Journal of Finance 7 77-91
[6]  
He J.(2004)A study of smoothing methods for language models applied to information retrieval ACM Trans. Inf. Syst. 22 179-214
[7]  
Meij E.(2006)A risk minimization framework for information retrieval Inf. Process. Manage. 42 31-55
[8]  
de Rijke M.(undefined)undefined undefined undefined undefined-undefined
[9]  
Järvelin K.(undefined)undefined undefined undefined undefined-undefined
[10]  
Kekäläinen J.(undefined)undefined undefined undefined undefined-undefined