Design Trade-Offs for Search Engine Caching

被引:56
作者
Baeza-Yates, Ricardo [1 ]
Gionis, Aristides [1 ]
Junqueira, Flavio P. [1 ]
Murdock, Vanessa [1 ]
Plachouras, Vassilis [1 ]
Silvestri, Fabrizio [2 ]
机构
[1] Yahoo Res Barcelona, Barcelona 08018, Spain
[2] CNR, Ist ISTI A Faedo, I-56100 Pisa, Italy
关键词
Algorithms; Design; Caching; Web search; query logs;
D O I
10.1145/1409220.1409223
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.
引用
收藏
页数:28
相关论文
共 30 条
[1]  
Anh V. N., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P372, DOI 10.1145/1148170.1148235
[2]  
[Anonymous], 1994, MANAGING GIGABYTES C
[3]  
Baeza-Yates R., 2007, P 30 ANN INT ACM SIG, P183, DOI DOI 10.1145/1277741.1277775
[4]  
Baeza-Yates R, 2007, LECT NOTES COMPUT SC, V4726, P74
[5]  
BaezaYates R, 2003, LECT NOTES COMPUT SC, V2857, P56
[6]  
Beitzel S. M., 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P321, DOI 10.1145/1008992.1009048
[7]  
BOLDI P, 2004, SOFTWARE PRACT EXPER, V34, P8
[8]  
Buckley C., 1985, P 8 ANN INT ACM SIGI, P97
[9]  
Buttcher S., 2006, P 15 ACM INT C INFOR, P182, DOI DOI 10.1145/1183614.1183644
[10]  
Cao P., 1997, USENIX S INT TECHN S