Seed Selection for Domain-Specific Search

被引:5
作者
Priyatam, Pattisapu Nikhil [1 ]
Dubey, Ajay [1 ]
Perumal, Krish [1 ]
Praneeth, Sai [1 ]
Kakadia, Dharmesh [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Search & Informat Extract Lab, Hyderabad, AP, India
来源
WWW'14 COMPANION: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2014年
关键词
D O I
10.1145/2567948.2579216
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The last two decades have witnessed an exponential rise in web content from a plethora of domains, which has necessitated the use of domainspecific search engines. Diversity of crawled content is one of the crucial aspects of a domain specific search engine. To a large extent, diversity is governed by the initial set of seed URLs. Most of the existing approaches rely on manual effort for seed selection. In this work we automate this process using URLs posted on Twitter. We propose an algorithm to get a set of diverse seed URLs from a Twitter URL graph. We compare the performance of our approach against the baseline zero similarity seed selection method and find that our approach beats the baseline by a significant margin.
引用
收藏
页码:923 / 928
页数:6
相关论文
共 13 条
[1]  
[Anonymous], 2011, PROCEEDINGS OF THE I
[2]  
[Anonymous], 2010, Proc. 19th Int. Conf. World Wide Web
[3]  
Boanjak Matko., 2012, Proceedings of the 21th International Conference on World Wide Web Companion, P1233, DOI DOI 10.1145/2187980.2188266
[4]  
Dmitriev P., 2008, US Patent App, Patent No. [12/ 259,164, 12259164]
[5]  
Finin Tim., 2010, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, P80
[6]  
Menczer F., 2001, SIGIR Forum, P241
[7]  
Mishne G, 2012, SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P1159, DOI 10.1145/2348283.2348518
[8]  
Phelan Owen., 2009, P 3 ACM C RECOMMENDE, P385, DOI [10.1145/1639714.1639794, DOI 10.1145/1639714.1639794]
[9]  
Prasath R, 2011, LECT NOTES COMPUT SC, V6744, P227, DOI 10.1007/978-3-642-21786-9_38
[10]  
Shankar K., 2011, CROWDSOURCING TWEET