Cluster-based information retrieval using pattern mining

被引：0

作者：

Youcef Djenouri

Asma Belhadi

Djamel Djenouri

Jerry Chun-Wei Lin

机构：

[1] Dept. of Mathematics and Cybernetics,Computer Science Research Center, Dep. of Computer Science and Creative Technology

[2] SINTEF Digital,Dept. of Computing, Mathematics and Physics

[3] Kristiania University College,undefined

[4] University of the West of England,undefined

[5] Western Norway University of Applied Sciences (HVL),undefined

来源：

Applied Intelligence | 2021年 / 51卷

关键词：

Information retrieval; Data mining; Cluster-based approaches; Frequent and high-utility pattern mining.;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper addresses the problem of responding to user queries by fetching the most relevant object from a clustered set of objects. It addresses the common drawbacks of cluster-based approaches and targets fast, high-quality information retrieval. For this purpose, a novel cluster-based information retrieval approach is proposed, named Cluster-based Retrieval using Pattern Mining (CRPM). This approach integrates various clustering and pattern mining algorithms. First, it generates clusters of objects that contain similar objects. Three clustering algorithms based on k-means, DBSCAN (Density-based spatial clustering of applications with noise), and Spectral are suggested to minimize the number of shared terms among the clusters of objects. Second, frequent and high-utility pattern mining algorithms are performed on each cluster to extract the pattern bases. Third, the clusters of objects are ranked for every query. In this context, two ranking strategies are proposed: i) Score Pattern Computing (SPC), which calculates a score representing the similarity between a user query and a cluster; and ii) Weighted Terms in Clusters (WTC), which calculates a weight for every term and uses the relevant terms to compute the score between a user query and each cluster. Irrelevant information derived from the pattern bases is also used to deal with unexpected user queries. To evaluate the proposed approach, extensive experiments were carried out on two use cases: the documents and tweets corpus. The results showed that the designed approach outperformed traditional and cluster-based information retrieval approaches in terms of the quality of the returned objects while being very competitive in terms of runtime.

引用

页码：1888 / 1903

页数：15

共 113 条

[1]

Chen MS(1996)Data mining: an overview from a database perspective IEEE Trans Knowl Data Eng 8 866-883

[2]

Han J(2000)Information retrieval from documents: A survey Information retrieval 2 141-163

[3]

Yu PS(2016)Unsupervised rare pattern mining: a survey ACM Transactions on Knowledge Discovery from Data 10 45-97

[4]

Mitra M(2014)Data mining for internet of things: a survey. IEEE Communications Surveys and Tutorials 16 77-368

[5]

Chaudhuri BB(2019)Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A survey Inf Sci 490 344-460

[6]

Koh YS(2018)Selective cluster presentation on the search results page ACM Transactions on Information Systems (TOIS) 36 28-12

[7]

Ravana SD(2009)Re-ranking search results using language models of query-specific clusters Inf Retr 12 437-43

[8]

Tsai CW(2000)Mining frequent patterns without candidate generation ACM sigmod record 29 1-620

[9]

Lai CF(2016)Scalable and efficient web search result diversification ACM Transactions on the Web (TWEB) 10 15-12881

[10]

Chiang MC(2001)Modern information retrieval: A brief overview IEEE Data Eng. Bull. 24 35-1359

← 1 2 3 4 5 6 7 8 9 10 →