Sources of Evidence for Vertical Selection

被引:57
作者
Arguello, Jaime [1 ]
Diaz, Fernando
Callan, Jamie [1 ]
Crespo, Jean-Francois
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2009年
关键词
vertical selection; distributed information retrieval; resource selection; aggregated search; query classification;
D O I
10.1145/1571941.1571997
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine's main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.
引用
收藏
页码:315 / 322
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 2005, SIGKDD EXPLORATIONS
[2]  
[Anonymous], 2009, Proceedings of the Second ACM International Conference on Web Search and Data Mining. WSDM'09, DOI DOI 10.1145/1498759.1498825
[3]   Improving automatic query classification via semi-supervised learning [J].
Beitzel, SM ;
Jensen, EC ;
Frieder, O ;
Lewis, DD ;
Chowdhury, A ;
Kolcz, A .
Fifth IEEE International Conference on Data Mining, Proceedings, 2005, :42-49
[4]  
BEITZEL SM, 2007, TOIS, V25, P9
[5]  
Bhattacharyya A.K., 1943, Bull. Calcutta Math. Soc., V35, P99
[6]   Query-based sampling of text databases [J].
Callan, J ;
Connell, M .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2001, 19 (02) :97-130
[7]  
Callan J., 2000, ADV INFORM RETRIEVAL, P127
[8]  
Callan J. P., 1995, SIGIR Forum, P21
[9]  
Cronen-Townsend S., 2002, Proceedings of SIGIR 2002. Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P299
[10]   GlOSS:: Text-source discovery over the Internet [J].
Gravano, L ;
García-Molina, H ;
Tomasic, A .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (02) :229-264