Learning to Reweight Terms with Distributed Representations

被引:62
作者
Zheng, Guoqing [1 ]
Callan, Jamie [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
来源
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2015年
关键词
Query term weighting; distributed representations; word vectors; MODELS;
D O I
10.1145/2766462.2767700
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Term weighting is a fundamental problem in IR research and numerous weighting models have been proposed. Proper term weighting can greatly improve retrieval accuracies, which essentially involves two types of query understanding: interpreting the query and judging the relative contribution of the terms to the query. These two steps are often dealt with separately, and complicated yet not so effective weighting strategies are proposed. In this paper, we propose to address query interpretation and term weighting in a unified framework built upon distributed representations of words from recent advances in neural network language modeling. Specifically, we represent term and query as vectors in the same latent space, construct features for terms using their word vectors and learn a model to map the features onto the defined target term weights. The proposed method is simple yet effective. Experiments using four collections and two retrieval models demonstrates significantly higher retrieval accuracies than baseline models.
引用
收藏
页码:575 / 584
页数:10
相关论文
共 23 条
  • [1] [Anonymous], 2013, CORR
  • [2] Bendersky M, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P605
  • [3] Bendersky Michael, 2010, P 3 ACM INT C WEB SE, P31
  • [4] A neural probabilistic language model
    Bengio, Y
    Ducharme, R
    Vincent, P
    Jauvin, C
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) : 1137 - 1155
  • [5] Chapelle O., 2009, P 18 ACM C INF KNOWL, P621, DOI DOI 10.1145/1645953.1646033
  • [6] Chengxiang Zhai, 2001, SIGIR Forum, P334
  • [7] USING PROBABILISTIC MODELS OF DOCUMENT-RETRIEVAL WITHOUT RELEVANCE INFORMATION
    CROFT, WB
    HARPER, DJ
    [J]. JOURNAL OF DOCUMENTATION, 1979, 35 (04) : 285 - 295
  • [8] Greiff W. R., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P11, DOI 10.1145/290941.290948
  • [9] Hastie T, 1996, J ROY STAT SOC B, V58, P155
  • [10] Jansen Bernard J, 2007, P 16 INT C WORLD WID, P1149, DOI DOI 10.1145/1242572.1242739