An efficient document information retrieval using hybrid global search optimization algorithm with density based clustering technique

被引：6

作者：

Inje, Bhushan ^{[1
,2
]}

Nagwanshi, Kapil Kumar ^{[2
]}

Rambola, Radha Krishna ^{[3
]}

机构：

[1] ASET Amity Univ Rajasthan, Dept Comp Sci, Jaipur, Rajasthan, India

[2] Guru Ghasidas Vishwavidyalaya A Cent Univ, Comp Sci & Engn, Bilaspur, India

[3] NMIMS Univ Mumbai, Dept Comp Sci, Shirpur, India

来源：

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2024年 / 27卷 / 01期

关键词：

Optimization; Clustering; Preprocessing; Pattern mining; Document retrieval;

D O I：

10.1007/s10586-023-03976-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the increased size of data, there is a need for retrieving the right document for the user efficiently, which finds various applications in the research community. In this work, we propose Hybrid Global Search Optimization with Density based clustering (HGSODC) that extend the current state of the art, which is mostly based on searching a document from closed frequent terms to bring efficient result by alleviating convergence problem. Firstly, the documents are preprocessed by removing stop words, stemming, and then grouped using hierarchical density-based spatial clustering of applications with noise (HDBSCAN) clustering, and then closed frequent patterns mining is performed at each document. Secondly, the search is done using the HGSOA algorithm, and the documents are retrieved. We determine the effectiveness of the HGSODC approach through a set of experiments under the NPL, LISA, and CACM corpus. Compared to some existing related work, a wide range of evaluations are provided to show the strength of the proposed method in terms of precision, recall, MAP, F-score, accuracy, and convergence rate by running multiple experiments to compare our approaches with different baselines. The results indicate that the proposed HGSODC approach outperforms the traditional document information retrieval methods based on returned document quality and running time.

引用

页码：689 / 705

页数：17

共 40 条

[1]

Abualigah LMQ., 2019, FEATURE SELECTION EN, DOI DOI 10.1007/978-3-030-10674-4

[2] A hybrid semantic query expansion approach for Arabic information retrieval [J].

ALMarwi, Hiba ;

Ghurab, Mossa ;

Al-Baltah, Ibrahim .

JOURNAL OF BIG DATA, 2020, 7 (01)

[3] An automatic query expansion based on hybrid CMO-COOT algorithm for optimized information retrieval [J].

Alqahtani, Abdullah Saleh ;

Saravanan, P. ;

Maheswari, M. ;

Alshmrany, Sami .

JOURNAL OF SUPERCOMPUTING, 2022, 78 (06) :8625-8643

[4] Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem [J].

Belhadi, Asma ;

Djenouri, Youcef ;

Lin, Jerry Chun-Wei ;

Zhang, Chongsheng ;

Cano, Alberto .

IEEE ACCESS, 2020, 8 :10569-10583

[5] Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering [J].

Bezdan, Timea ;

Stoean, Catalin ;

Al Naamany, Ahmed ;

Bacanin, Nebojsa ;

Rashid, Tarik A. ;

Zivkovic, Miodrag ;

Venkatachalam, K. .

MATHEMATICS, 2021, 9 (16)

[6] Swarm optimized cluster based framework for information retrieval [J].

Bhopale, Amol P. ;

Tiwari, Ashish .

EXPERT SYSTEMS WITH APPLICATIONS, 2020, 154

[7] Ranked document retrieval for multiple patterns [J].

Biswas, Sudip ;

Ganguly, Arnab ;

Shah, Rahul ;

Thankachan, Sharma V. .

THEORETICAL COMPUTER SCIENCE, 2018, 746 :98-111

[8] Research lines on the impact of the COVID-19 pandemic on business. A text mining analysis [J].

Carracedo, Patricia ;

Puertas, Rosa ;

Marti, Luisa .

JOURNAL OF BUSINESS RESEARCH, 2021, 132 :586-593

[9] Fuzzy generalized median graphs computation: Application to content-based document retrieval [J].

Chaieb, Ramzi ;

Kalti, Karim ;

Luqman, Muhammad Muzzamil ;

Coustaty, Mickael ;

Ogier, Jean-Marc ;

Ben Amara, Najoua Essoukri .

PATTERN RECOGNITION, 2017, 72 :266-284

[10] Bee-foraging learning particle swarm optimization [J].

Chen, Xu ;

Tianfield, Hugo ;

Du, Wenli .

APPLIED SOFT COMPUTING, 2021, 102

← 1 2 3 4 →