Quantifying Performance and Quality Gains in Distributed Web Search Engines

被引:28
作者
Barla Cambazoglu, B. [1 ]
Plachouras, Vassilis [1 ]
Baeza-Yates, Ricardo [1 ]
机构
[1] Yahoo Res, Barcelona, Spain
来源
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2009年
关键词
Data centers; distributed query processing; index partitioning; Web crawling;
D O I
10.1145/1571941.1572013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed search engines based on geographical partitioning of a central Web index emerge as a feasible solution to the immense growth of the Web, user bases, and query traffic. However, there is still lack of research in quantifying the performance and quality gains that can be achieved by such architectures. In this paper, we develop various cost models to evaluate the performance benefits of a geographically distributed search engine architecture based on partial index replication and query forwarding. Specifically, we focus on possible performance gains due to the distributed nature of query processing and Web crawling processes. We show that any response time gain achieved by distributed query processing can be utilized to improve search relevance as the use of complex but more accurate algorithms can now be enabled for document ranking. We also show that distributed Web crawling leads to better Web coverage and try to see if this improves the search quality. We verify the validity of Our claims over large, real-life datasets via simulations.
引用
收藏
页码:411 / 418
页数:8
相关论文
共 17 条
  • [1] Baeza-Yates R., 2005, SPEC INT TRACKS 14 I
  • [2] Baeza-Yates R., 2007, DATA ENG, P6
  • [3] BAEZAYATES R, 2009, FEASIBILITY MULTISIT
  • [4] Bawa M., 2003, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, P306, DOI DOI 10.1145/860435.860491
  • [5] Callan J, 2000, KLUW S INF, V7, P127
  • [6] CALLAN JP, 1995, P 18 ANN INT ACM SIG, P339
  • [7] CAMBAZOGLU BB, 2008, P 3 INT C SCAL INF S
  • [8] Efficient crawling through URL ordering
    Cho, J
    Garcia-Molina, H
    Page, L
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 161 - 172
  • [9] HARMAN D, 1990, J AM SOC INFORM SCI, V41, P581, DOI 10.1002/(SICI)1097-4571(199012)41:8<581::AID-ASI4>3.0.CO
  • [10] 2-U