Building domain-specific web collections for scientific digital libraries: A meta-search enhanced focused crawling method

被引:0
作者
Qin, JL [1 ]
Zhou, YL [1 ]
Chau, M [1 ]
机构
[1] Univ Arizona, Dept Management Informat Syst, Tucson, AZ 85721 USA
来源
JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT | 2004年
关键词
digital libraries; domain-specific collection building; focused crawlin; meta-search; Web search algorithm;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Collecting domain-specific documents from the Web using focused crawlers has been considered one of the most important strategies to build digital libraries that serve the scientific community. However, because most focused crawlers use local search algorithms to traverse the Web space, they could be easily trapped within a limited sub-graph of the Web that surrounds the starting URLs and build domain-specific collections that are not comprehensive and diverse enough to scientists and researchers. In this study, we investigated the problems of traditional focused crawlers caused by local search algorithms and proposed a new crawling approach, meta-search enhanced focused crawling, to address the problems. We conducted two user evaluation experiments to examine the performance of our proposed approach and the results showed that our approach could build domain-specific collections with higher quality than traditional focused crawling techniques.
引用
收藏
页码:135 / 141
页数:7
相关论文
共 23 条
[1]  
[Anonymous], COMPUTER NETWORKS IS
[2]  
BERGMARK D, 2002, P JOINT C DIG LIB 20
[3]  
BERGMARK D, 2002, P 6 EUR C DIG LIB RO
[4]   SCALABLE INTERNET RESOURCE DISCOVERY - RESEARCH PROBLEMS AND APPROACHES [J].
BOWMAN, CM ;
DANZIG, PB ;
MANBER, U ;
SCHWARTZ, MF .
COMMUNICATIONS OF THE ACM, 1994, 37 (08) :98-&
[5]  
CHAKRABARTI S, 1999, P 8 INT WORLD WID WE
[6]   Comparison of three vertical search spiders [J].
Chau, M ;
Chen, HC .
COMPUTER, 2003, 36 (05) :56-+
[7]  
CHAU M, 2002, P JOINT C DIG LIB PO
[8]  
Chen HC, 1998, J AM SOC INFORM SCI, V49, P604, DOI 10.1002/(SICI)1097-4571(19980515)49:7<604::AID-ASI3>3.0.CO
[9]  
2-T
[10]  
DEAN J, 1999, P 8 INT WORLD WID WE