SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

被引:152
作者
Formal, Thibault [1 ,2 ]
Piwowarski, Benjamin [3 ]
Clinchant, Stephane [1 ]
机构
[1] Naver Labs Europe, Meylan, France
[2] Sorbonne Univ, LIP6, Paris, France
[3] Sorbonne Univ, LIP6, CNRS, Paris, France
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
关键词
neural networks; indexing; sparse representations; regularization;
D O I
10.1145/3404835.3463098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.
引用
收藏
页码:2288 / 2292
页数:5
相关论文
共 29 条
[1]  
Bai Yang, 2020, ARXIV201000768
[2]  
Boytsov L., 2018, Efficient and Accurate Non-Metric k-NN Search with Applications to Text Matching
[3]  
Craswell Nick, 2020, TEXT RETRIEVAL C TRE
[4]  
Dai Zhuyun, 2020, Context-Aware Document Term Weighting for Ad-Hoc Search, P1897, DOI DOI 10.1145/3366423.3380258
[5]  
Dai Zhuyun, 2019, Context-aware sentence/passage term importance estimation for first stage retrieval
[6]  
Dai Zhuyun, 2020, Context-Aware Term Weighting For First Stage Passage Retrieval, P1533, DOI DOI 10.1145/3397271.3401204
[7]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]  
Ding Yingqi Qu Yuchen, 2020, ARXIV201008191CSCL
[9]  
Guu K, 2020, PR MACH LEARN RES, V119
[10]  
Hofstatter Sebastian, 2020, ARXIV201002666CSIR