Learning to rank with (a lot of) word features

被引:69
作者
Bai, Bing [1 ]
Weston, Jason [1 ]
Grangier, David [1 ]
Collobert, Ronan [1 ]
Sadamasa, Kunihiko [1 ]
Qi, Yanjun [1 ]
Chapelle, Olivier [2 ]
Weinberger, Kilian [2 ]
机构
[1] NEC Labs Amer, Princeton, NJ USA
[2] Yahoo Res, Santa Clara, CA USA
来源
INFORMATION RETRIEVAL | 2010年 / 13卷 / 03期
关键词
Semantic indexing; Feature hashing; Learning to rank; Cross language retrieval; Content matching;
D O I
10.1007/s10791-009-9117-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article we present Supervised Semantic Indexing which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained from a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as cross-language retrieval or online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, correlated feature hashing and sparsification. We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.
引用
收藏
页码:291 / 314
页数:24
相关论文
共 46 条
  • [1] [Anonymous], 2005, INT C MACH LEARN
  • [2] [Anonymous], 2008, International Conference on Research and Development in Information Retrieval, DOI [10.1145/, DOI 10.1145/1390334.1390367]
  • [3] [Anonymous], P SIGIR 2007 WORKSH
  • [4] [Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
  • [5] [Anonymous], P SIGIR WORKSH INF R
  • [6] [Anonymous], AAAI SPRING S CROSS
  • [7] [Anonymous], 2008, Advances in Neural Information Processing Systems
  • [8] [Anonymous], INT C MACH LEARN
  • [9] [Anonymous], 2000, Large margin rank boundaries for ordinal regression
  • [10] Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463