Robust classification of rare queries using web knowledge

被引:62
作者
Broder, Andrei Z. [1 ]
Fontoura, Marcus [1 ]
Gabrilovich, Evgeniy [1 ]
Joshi, Amruta [1 ]
Josifovski, Vanja [1 ]
Zhang, Tong [1 ]
机构
[1] Yahoo Research, Santa Clara, CA 95054
来源
Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 | 2007年
关键词
Blind relevance feedback; Query classification; Web search;
D O I
10.1145/1277741.1277783
中图分类号
学科分类号
摘要
We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search engine. We use a blind feedback technique: given a query, we determine its topic by classifying the web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregation account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience. Copyright 2007 ACM.
引用
收藏
页码:231 / 238
页数:7
相关论文
共 22 条
[1]  
Beitzel S., Jensen E., Frieder O., Grossman D., Lewis D., Chowdhury A., Kolcz A., Automatic web query classification using labeled and unlabeled training data, Proceedings of SIGIR. '05, (2005)
[2]  
Beitzel S., Jensen E., Frieder O., Lewis D., Chowdhury A., Kolcz A., Improving automatic query classification via, semi-supervised learning, Proceedings of ICDM'05, (2005)
[3]  
Duda R., Hart P., Pattern Classification and Scene Analysis, (1973)
[4]  
Efthimiadis E., Biron P., UCLA-Okapi at TREC-2: Query expansion experiments, TREC-2, (1994)
[5]  
Gabrilovich E., Markovitch S., Feature generation for text categorization using world knowledge, IJCAI'05, pp. 1048-1053, (2005)
[6]  
Gravano L., Hatzivassiloglou V., Lichtenstein R., Categorizing web queries according to geographical locality, CIKM'03, (2003)
[7]  
Han E., Karypis G., Centroid-based document classification: Analysis and experimental results, PKDD'00, (2000)
[8]  
Jarvelin K., Kekalainen J., IR evaluation methods for retrieving highly relevant documents, SIGIR'00, (2000)
[9]  
Kardkovacs Z., Tikk D., Bansaghi Z., The ferrety algorithm for the KDD Cup 2005 problem, SIGKDD Explorations, 7, (2005)
[10]  
Kowalczyk P., Zukerman I., Niemann M., Analyzing the effect of query class on document retrieval performance, Proc. Australian Conf. on AI, pp. 550-561, (2004)