A Sparsifier Model for Efficient Information Retrieval

被引:1
作者
Dobrynin, Viacheslav [1 ]
Sherman, Mark [1 ]
Abramovich, Roman [1 ]
Platonov, Alexey [1 ]
机构
[1] ITMO Univ, Fac Software Engn & Comp Syst, St Petersburg, Russia
来源
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024 | 2024年
关键词
sparsity; inverted index; neural networks; independence;
D O I
10.1109/AICT61888.2024.10740301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The constant development of dense neural models leads to improved search quality. It is crucial to adapt these models to meet performance requirements. Solutions like SPLADE or SparseEmbed address this by solving the ranking task, whereas our work proposes addressing the simplified task of sparsifying dense vector representations. This approach facilitates the faster adaptation of new dense models for use with efficient inverted indexes. The importance of the independence property for sparse space features, achieved through the use of iVAE, is demonstrated. Additionally, the model is trained to maintain the ranking properties of the dense model, which in our case was a BERT model. As a result, the obtained model showed search quality close to the original BERT model. The proposed sparsification approach can be applied to other tasks requiring sparse spaces by adding new or replacing existing properties of the sparse space. Thus, the paper describes the main aspects of a sparsifier model applied to the task of information retrieval.
引用
收藏
页数:4
相关论文
共 15 条
[1]  
Campos D. F., 2016, ArXiv, abs 1611.09268
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]   From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective [J].
Formal, Thibault ;
Lassance, Carlos ;
Piwowarski, Benjamin ;
Clinchant, Stephane .
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :2353-2359
[4]   SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking [J].
Formal, Thibault ;
Piwowarski, Benjamin ;
Clinchant, Stephane .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :2288-2292
[5]  
Formal Thibault., 2021, arXiv
[6]   Independent component analysis:: algorithms and applications [J].
Hyvärinen, A ;
Oja, E .
NEURAL NETWORKS, 2000, 13 (4-5) :411-430
[7]   ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT [J].
Khattab, Omar ;
Zaharia, Matei .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :39-48
[8]  
Khemakhem I., 2020, ArXiv, abs 2002.11537
[9]  
Khemakhem I., 2019, INT C ART INT STAT
[10]   SparseEmbed: Learning Sparse Lexical Representations with Contextual Embeddings for Retrieval [J].
Kong, Weize ;
Dudek, Jeffrey M. ;
Li, Cheng ;
Zhang, Mingyang ;
Bendersky, Michael .
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, :2399-2403