The probabilistic relevance framework: BM25 and beyond

被引:1464
作者
Robertson, Stephen [1 ]
Zaragoza, Hugo [2 ]
机构
[1] Microsoft Research, Cambridge CB3 0FB
[2] Yahoo Research, Barcelona 08028
来源
Foundations and Trends in Information Retrieval | 2009年 / 3卷 / 04期
关键词
D O I
10.1561/1500000019
中图分类号
学科分类号
摘要
The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 19701980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters. Copyright © 2009 S. Robertson and H. Zaragoza.
引用
收藏
页码:333 / 389
页数:56
相关论文
共 53 条
  • [31] Perez-Aguera J.R., Zaragoza H., UCM-Y!R at CLEF 2008 Robust and WSD tasks, CLEF 2008 Workshop, (2008)
  • [32] Perez-Aguera J.R., Zaragoza H., Araujo L., Exploiting morphological query structure using genetic optimization, NLDB 2008 13th International Conference on Applications of Natural Language to Information Systems, (2008)
  • [33] Perez-Iglesias J., BM25 and BM25F Implementation for Lucene
  • [34] Robertson S.E., The probability ranking principle in information retrieval, Journal of Documentation, 33, pp. 294-304, (1977)
  • [35] Robertson S.E., On term selection for query expansion, Journal of Documentation, 46, pp. 359-364, (1990)
  • [36] Robertson S.E., Threshold setting and performance optimization in adaptive filtering, Information Retrieval, 5, pp. 239-256, (2002)
  • [37] Robertson S.E., Maron M.E., Cooper W.S., The unified probabilistic model for IR, Proceedings of Research and Development in Information Retrieval, pp. 108-117, (1983)
  • [38] Robertson S.E., Sparck Jones K., Relevance weighting of search terms, Journal of the American Society for Information Science, (1977)
  • [39] Robertson S.E., Van Rijsbergen C.J., Porter M.F., Probabilistic models of indexing and searching, Information Retrieval Research (Proceedings of Research and Development in Information Retrieval, Cambridge, 1980), pp. 35-56, (1981)
  • [40] Robertson S.E., Walker S., Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 232-241, (1994)