The probabilistic relevance framework: BM25 and beyond

被引:1464
作者
Robertson, Stephen [1 ]
Zaragoza, Hugo [2 ]
机构
[1] Microsoft Research, Cambridge CB3 0FB
[2] Yahoo Research, Barcelona 08028
来源
Foundations and Trends in Information Retrieval | 2009年 / 3卷 / 04期
关键词
D O I
10.1561/1500000019
中图分类号
学科分类号
摘要
The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 19701980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters. Copyright © 2009 S. Robertson and H. Zaragoza.
引用
收藏
页码:333 / 389
页数:56
相关论文
共 53 条
  • [51] Van Rijsbergen C.J., Information Retrieval, (1979)
  • [52] Voorhees E.M., Harman D.K., Overview of the eighth text retrieval conference (TREC-8), The Eighth Text Retrieval Conference (TREC-8), pp. 1-24, (2000)
  • [53] Zaragoza H., Craswell N., Taylor M., Saria S., Robertson S.E., Microsoft Cambridge at TREC 2004: Web and HARD track, The Thirteenth Text Retrieval Conference (TREC 2004), pp. 500-1261, (2005)