Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

被引:26
作者
Yu, HongChien [1 ]
Xiong, Chenyan [2 ]
Callan, Jamie [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021 | 2021年
关键词
Dense retrieval; query representation; pseudo relevance feedback;
D O I
10.1145/3459637.3482124
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dense retrieval systems conduct first-stage retrieval using embedded representations and simple similarity metrics to match a query to documents. Its effectiveness depends on encoded embeddings to capture the semantics of queries and documents, a challenging task due to the shortness and ambiguity of search queries. This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels. It also keeps the document index unchanged to reduce overhead. ANCE-PRF significantly outperforms ANCE and other recent dense retrieval systems on several datasets. Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.
引用
收藏
页码:3592 / 3596
页数:5
相关论文
共 36 条
[1]   Learning a Deep Listwise Context Model for Ranking Refinement [J].
Ai, Qingyao ;
Bi, Keping ;
Guo, Jiafeng ;
Croft, W. Bruce .
ACM/SIGIR PROCEEDINGS 2018, 2018, :135-144
[2]  
[Anonymous], 1996, P 19 ANN INT ACM SIG
[3]  
Bendersky M, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P605
[4]  
Bruce Croft W., 2010, Search engines: Information retrieval in practice, V520
[5]  
CHEN XL, 2021, ARXIV, P9620, DOI DOI 10.1109/ICCV48922.2021.00950
[6]  
Collins-Thompson Kevyn, 2009, P 18 ACM C INF KNOWL, P837, DOI 10.1145/1645953.1646059
[7]  
Craswell Nick, 2020, NIST SPECIAL PUBLICA
[8]  
Craswell Nick, 2020, NIST SPECIAL PUBLICA
[9]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10]  
Gao Luyu, 2021, LECT NOTES COMPUTER, V12656, P146