An index-based algorithm for fast on-line query processing of latent semantic analysis

被引：2

作者：

Zhang, Mingxi ^{[1
,2
]}

Li, Pohan ^{[2
]}

Wang, Wei ^{[2
]}

机构：

[1] Univ Shanghai Sci & Technol, Coll Commun & Art Design, Shanghai, Peoples R China

[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

来源：

PLOS ONE | 2017年 / 12卷 / 05期

基金：

上海市自然科学基金;

关键词：

SINGULAR-VALUE DECOMPOSITION; JACOBI SVD ALGORITHM; SIMILARITY SEARCH; RECOMMENDER; NETWORK;

D O I：

10.1371/journal.pone.0177523

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold.. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.

引用

页数：23

共 24 条

[1] Query and relevance feedback in latent semantic index with reduced time complexity
Yamout, F
Moghrabi, I
Oakes, M
PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON DATABASES AND APPLICATIONS, 2004, : 191 - 196
[2] Parallel Processing Design of Latent Semantic Analysis Based Essay Grading System with OpenMP
Ratna, Anak Agung Putri
Ibrahim, Ihsan
Purnamasari, Prima Dewi
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2017), 2017, : 119 - 124
[3] Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis
Seshadri, Karthick
Iyer, K. Viswanathan
Shalinie, Mercy S.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (13)
[4] Similarity-based Matrix Completion Algorithm for Latent Semantic Indexing
Mirzal, Andri
2013 IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2013), 2013, : 79 - 84
[5] Efficient processing of similarity search under time warping in sequence databases: an index-based approach
Kim, SW
Park, S
Chu, WW
INFORMATION SYSTEMS, 2004, 29 (05) : 405 - 420
[6] Web Text Classification Based on Improved Latent Semantic Analysis
Wang, Lan
Wan, Yuan
2011 SECOND ETP/IITA CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2011), VOL 1, 2011, : 176 - 179
[7] A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis
Wang, Yingjie
Ma, Jun
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 394 - 401
[8] The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration
Houbraken, Maarten
Demeyer, Sofie
Michoel, Tom
Audenaert, Pieter
Colle, Didier
Pickavet, Mario
PLOS ONE, 2014, 9 (05):
[9] Fast line detection algorithm based on singular value decomposition
Yong, Yang
Wang, Bingxue
Huang, Baoping
Huang, Zili
Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2011, 40 (05): : 953 - 957
[10] A New Approach for Multi-Document Summarization based on Latent Semantic Analysis
Xiong, Shuchu
Luo, Yihui
2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 177 - 180

← 1 2 3 →