Architecture of a grid-enabled Web search engine

被引:14
作者
Cambazoglu, B. Barla [1 ]
Karaca, Evren [1 ]
Kucukyilmaz, Tayfun [1 ]
Turk, Ata [1 ]
Aykanat, Cevdet [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkey
关键词
search engine; Web crawling; text classification; grid computing;
D O I
10.1016/j.ipm.2006.10.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architectural design issues and implementation details of this search engine. We conduct various experiments to illustrate performance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:609 / 623
页数:15
相关论文
共 46 条
[1]   Exploiting interclass rules for focused crawling [J].
Altingövde, IS ;
Ulusoy, O .
IEEE INTELLIGENT SYSTEMS, 2004, 19 (06) :66-73
[2]  
AMBAZOGLU BB, 2006, INFORM PROCESSING MA, V42, P875
[3]  
AMBAZOGLU BB, 2005, BUCE0502 DEP COMP EN
[4]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[5]  
[Anonymous], 4 TEXT RETR C TREC 4
[6]  
[Anonymous], P VLDB
[7]  
[Anonymous], 2001, Proceedings of the 10th international conference on World Wide Web
[8]  
ARASU A, 2001, ACM T INTERNET TECHN, V1, P2, DOI DOI 10.1145/383034.383035.D0I:10.1145/383034.383035
[9]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[10]  
BAEZAYATES R, 2005, SPECIAL INTEREST TRA