A GNP-based Scheduling Strategy for Distributed Crawling

被引:3
作者
Liu, Shuang [1 ]
Xu, Xiao [1 ]
Li, Dong [1 ]
Zhang, Wei-zhe [1 ]
Liu, Xin-ran [2 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Technol, Harbin 150006, Peoples R China
[2] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing, Peoples R China
来源
WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS | 2009年
基金
中国国家自然科学基金;
关键词
distributed crawling; scheduling strategies; load balancing; network measurement; GNP;
D O I
10.1109/WISM.2009.136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to solve task scheduling and load balancing problems of distributed search engines, a GNP-based scheduling strategy for distributed crawling and a load balancing method are proposed in this paper. Internet distance estimating mechanism is adopted as a replacement for large-scale network distance measurement, which not only improves response speed of the system, but also reduces loads on WAN caused by the system. Through deploying crawling nodes at WANs, we built a distributed search engine, and implemented several scheduling strategies. The online experiment shows great improvement in system's performance.
引用
收藏
页码:651 / +
页数:2
相关论文
共 9 条
[1]  
Baeza-Yates R, 2007, INT C DAT ENG ICDE I
[2]   Architecture of a grid-enabled Web search engine [J].
Cambazoglu, B. Barla ;
Karaca, Evren ;
Kucukyilmaz, Tayfun ;
Turk, Ata ;
Aykanat, Cevdet .
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (03) :609-623
[3]  
DUSTIN B, 2003, DISTRIBUTED HIGH PER
[4]  
EXPOSTO JE, 2005, WORKSH GEOGR INF RET, P55
[5]   An architecture for a global Internet host distance estimation service [J].
Francis, P ;
Jamin, S ;
Paxson, V ;
Zhang, LX ;
Gryniewicz, DF ;
Jin, YX .
IEEE INFOCOM '99 - THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS: THE FUTURE IS NOW, 1999, :210-217
[6]  
Govindan R., 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064), P1371, DOI 10.1109/INFCOM.2000.832534
[7]  
Karger DavidR., 1997, P 29 ANN ACM S THEOR, P654
[8]  
KE Y, 2004, 9 NAT YOUTH COMM C C
[9]  
NG HZT, 2001, ACM SIGCOMM INT MEAS