The probabilistic relevance framework: BM25 and beyond

被引:1464
作者
Robertson, Stephen [1 ]
Zaragoza, Hugo [2 ]
机构
[1] Microsoft Research, Cambridge CB3 0FB
[2] Yahoo Research, Barcelona 08028
来源
Foundations and Trends in Information Retrieval | 2009年 / 3卷 / 04期
关键词
D O I
10.1561/1500000019
中图分类号
学科分类号
摘要
The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 19701980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters. Copyright © 2009 S. Robertson and H. Zaragoza.
引用
收藏
页码:333 / 389
页数:56
相关论文
共 53 条
  • [1] Proceedings of the NIPS 2005 Workshop on Learning to Rank, (2005)
  • [2] Amati G., Van Rijsbergen C.J., Joost C., Probabilistic models of information retrieval based on measuring the divergence from randomness, ACM Transactions on Information Systems, 20, 4, pp. 357-389, (2002)
  • [3] Beaulieu M.M., Gatford M., Huang X., Robertson S.E., Walker S., Williams P., Okapi at TREC-5, The Fifth Text Retrieval Conference (TREC5), pp. 143-165, (1997)
  • [4] Berghen F.V., Trust Region Algorithms
  • [5] Berghen F.V., CONDOR: A Constrained, Non-linear, Derivative-free Parallel Optimizer for Continuous, High Computing Load, Noisy Objective Functions, (2004)
  • [6] Bishop C., Pattern Recognition and Machine Learning (Information Science and Statistics), (2006)
  • [7] Blei D.M., Ng A.Y., Jordan M.I., Latent dirichlet allocation, Journal of Machine Learning Research, 3, pp. 993-1022, (2003)
  • [8] Bodoff D., Robertson S.E., A new unified probabilistic model, Journal of the American Society for Information Science and Technology, 55, pp. 471-487, (2004)
  • [9] Boldi P., Vigna S., MG4J at TREC 2005, The Fourteenth Text Retrieval Conference (TREC 2005) Proceedings, pp. 500-1266, (2005)
  • [10] Burges C., Shaked T., Renshaw E., Lazier A., Deeds M., Hamilton N., Hullender G., Learning to rank using gradient descent, Proceedings of the International Conference on Machine Learning (ICML), 2, (2005)