An Analysis of Fusion Functions for Hybrid Retrieval

被引:6
作者
Bruch, Sebastian [1 ]
Gai, Siyu [2 ]
Ingber, Amir [3 ]
机构
[1] Pinecone, New York, NY 10018 USA
[2] Univ Calif Berkeley, Berkeley, CA USA
[3] Pinecone, Tel Aviv, Israel
关键词
Hybrid retrieval; lexical and semantic search; fusion functions;
D O I
10.1145/3596512
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination of lexical and semantic scores, as well as the reciprocal rank fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a convex combination fusion is generally agnostic to the choice of score normalization; that convex combination outperforms RRF in in-domain and out-of-domain settings; and finally, that convex combination is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.
引用
收藏
页数:35
相关论文
共 47 条
  • [1] Malkov YA, 2018, Arxiv, DOI arXiv:1603.09320
  • [2] Asadi N, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P997
  • [3] Asadi Nima, 2013, Multi-Stage Search Architectures for Streaming Documents
  • [4] Bajaj P, 2018, Arxiv, DOI arXiv:1611.09268
  • [5] ReNeuIR: Reaching Efficiency in Neural Information Retrieval
    Bruch, Sebastian
    Lucchese, Claudio
    Nardini, Franco Maria
    [J]. PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3462 - 3465
  • [6] Revisiting Approximate Metric Optimization in the Age of Deep Neural Networks
    Bruch, Sebastian
    Zoghi, Masrour
    Bendersky, Michael
    Najork, Marc
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1241 - 1244
  • [7] Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models
    Chen, Tao
    Zhang, Mingyang
    Lu, Jing
    Bendersky, Michael
    Najork, Marc
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 95 - 110
  • [8] Reciprocal Rank Fusion outperforms Condorcet and Individual Rank Learning Methods
    Cormack, Gordon V.
    Clarke, Charles L. A.
    Buettcher, Stefan
    [J]. PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 758 - 759
  • [9] DANG V, 2013, ADV INFORM RETRIEVAL, P423, DOI DOI 10.1007/978-3-642-36973-5_36
  • [10] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171