Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

被引:11
作者
Cohen, Daniel [1 ]
Mitra, Bhaskar [2 ]
Lesota, Oleg [3 ]
Rekabsaz, Navid [3 ,4 ]
Eickhoff, Carsten [1 ]
机构
[1] Brown Univ, Providence, RI 02912 USA
[2] Microsoft, Montreal, PQ, Canada
[3] Johannes Kepler Univ Linz, Linz, Austria
[4] Linz Inst Technol, AI Lab, Linz, Austria
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
关键词
uncertainty; neural networks; calibration; search;
D O I
10.1145/3404835.3462951
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In any ranking system, the retrieval model outputs a single score for a document based on its belief on how relevant it is to a given search query. While retrieval models have continued to improve with the introduction of increasingly complex architectures, few works have investigated a retrieval model's belief in the score beyond the scope of a single value. We argue that capturing the model's uncertainty with respect to its own scoring of a document is a critical aspect of retrieval that allows for greater use of current models across new document distributions, collections, or even improving effectiveness for down-stream tasks. In this paper, we address this problem via an efficient Bayesian framework for retrieval models which captures the model's belief in the relevance score through a stochastic process while adding only negligible computational overhead. We evaluate this belief via a ranking based calibration metric showing that our approximate Bayesian framework significantly improves a retrieval model's ranking effectiveness through a risk aware reranking as well as its confidence calibration. Lastly, we demonstrate that this additional uncertainty information is actionable and reliable on down-stream tasks represented via cutoff prediction.
引用
收藏
页码:654 / 664
页数:11
相关论文
共 59 条
  • [11] Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search
    Dai, Zhuyun
    Xiong, Chenyan
    Callan, Jamie
    Liu, Zhiyuan
    [J]. WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 126 - 134
  • [12] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [13] Evaluating Stochastic Rankings with Expected Exposure
    Diaz, Fernando
    Mitra, Bhaskar
    Ekstrand, Michael D.
    Biega, Asia J.
    Carterette, Ben
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 275 - 284
  • [14] Dos Santos CN, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P1722
  • [15] Gal Y., 2017, ADV NEURAL INFORM PR, P3581
  • [16] Gal Y, 2016, PR MACH LEARN RES, V48
  • [17] Guo CA, 2017, PR MACH LEARN RES, V70
  • [18] Performance Prediction for Non-Factoid Question Answering
    Hashemi, Helia
    Zamani, Hamed
    Croft, W. Bruce
    [J]. PROCEEDINGS OF THE 2019 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'19), 2019, : 55 - 58
  • [19] Hashemi Helia, 2020, GUIDED TRANSFORMER L, P1131, DOI DOI 10.1145/3397271.3401061
  • [20] Hoffman MD, 2013, J MACH LEARN RES, V14, P1303