Spectral clustering and query expansion using embeddings on the graph-based extension of the set-based information retrieval model

被引:0
作者
Kalogeropoulos, Nikitas-Rigas [1 ]
Kontogiannis, George [1 ]
Makris, Christos [1 ]
机构
[1] Comp Engn & Informat Dept, Univ Campus, Patras 26504, Achaia, Greece
关键词
Information retrieval; Information retrieval models; Set-based model; Graphical representation of textual data; Clustering; Spectral clustering; Graph embeddings;
D O I
10.1016/j.eswa.2024.125771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a straightforward yet novel approach to enhance graph-based information retrieval models, by calibrating the relationships between node terms, leading to better evaluation metrics at the retrieval phase, and by reducing the total size of the graph. This is achieved by integrating spectral clustering, embedding-based graph pruning and term re-weighting. Spectral clustering assigns each term to a specific cluster, allowing us to propose two pruning methods: out-cluster and in-cluster pruning based on node similarities. In-cluster pruning refers to pruning edges between terms within the same cluster, while out-cluster pruning refers to edges that connect different clusters. Both methods utilize spectral embeddings to assess node similarities, resulting in more manageable clusters termed concepts. These concepts are likely to contain semantically similar terms, with each term's concept defined as the centroid of its cluster. We show that this graph pruning strategy significantly enhances the performance and effectiveness of the overall model, reducing, at the same time, its graph sparsity. Moreover, during the retrieval phase, the conceptually calibrated centroids are used to re-weight terms generated by user queries, and the precomputed embeddings enable efficient query expansion through a k-Nearest Neighbors (K-NN) approach, offering substantial enhancement with minimal additional time cost. To the best of our knowledge, this is the first application of spectral clustering and embedding-based conceptualization to prune graph-based IR models. Our approach enhances both retrieval efficiency and performance while enabling effective query expansion with minimal additional computational overhead. Our proposed technique is applied across various graph-based information retrieval models, improving evaluation metrics and producing sparser graphs.
引用
收藏
页数:15
相关论文
共 50 条
[21]   A graph model for mutual information based clustering [J].
Tetsuya Yoshida .
Journal of Intelligent Information Systems, 2011, 37 :187-216
[22]   A graph model for mutual information based clustering [J].
Yoshida, Tetsuya .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2011, 37 (02) :187-216
[23]   Graph based model for information retrieval using a stochastic local search [J].
Farhi, Sidali Hocine ;
Boughaci, Dalila .
PATTERN RECOGNITION LETTERS, 2018, 105 :234-239
[24]   Word sense discrimination in information retrieval: A spectral clustering-based approach [J].
Chifu, Adrian-Gabriel ;
Hristea, Florentina ;
Mothe, Josiane ;
Popescu, Marius .
INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (02) :16-31
[25]   A Proposed System for Recapitulating Tweets using Graph-based Clustering [J].
Lobo, Vivian Brian ;
Ansari, Nazneen ;
Shende, Rajkumar K. .
2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH, 2016, :667-670
[26]   First International Workshop on Graph-Based Approaches in Information Retrieval (IRonGraphs 2024) [J].
Boratto, Ludovico ;
Malitesta, Daniele ;
Marras, Mirko ;
Medda, Giacomo ;
Musto, Cataldo ;
Purificato, Erasmo .
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V, 2024, 14612 :415-421
[27]   Graph-based retrieval of building information models for supporting the early design stages [J].
Langenhan, Christoph ;
Weber, Markus ;
Liwicki, Marcus ;
Petzold, Frank ;
Dengel, Andreas .
ADVANCED ENGINEERING INFORMATICS, 2013, 27 (04) :413-426
[28]   A Graph-based Feature Selection Method for Learning to Rank Using Spectral Clustering for Redundancy Minimization and Biased PageRank for Relevance Analysis [J].
Jen-Yuan Yeh ;
Cheng-Jung Tsai .
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2022, 19 (01) :141-164
[29]   An Information Retrieval Expansion Model Based on Quasi-Clique [J].
Gan, Lixin ;
Tu, Wei ;
Xiong, Ying .
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRONIC SCIENCE AND AUTOMATION CONTROL, 2015, 20 :153-155
[30]   Improving Search Engine Query Expansion using Clustering and Indexing based Approach [J].
Deep, Sneha ;
Chawra, Vrajesh .
2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, :1836-1839