A Novel Item Cluster-Based Collaborative Filtering Recommendation System

被引：3

作者：

Lu, Yuching ^{[1
]}

Tozuka, Koki ^{[1
]}

Chakraborty, Goutam ^{[1
]}

Matsuhara, Masafumi ^{[1
]}

机构：

[1] Iwate Prefectural Univ, Fac Software & Informat Sci, Sugo 152-52, Takizawa, Iwate 0200693, Japan

来源：

REVIEW OF SOCIONETWORK STRATEGIES | 2021年 / 15卷 / 02期

关键词：

Adjacency matrix; Similarity metrics; Fractional norm; Spectral clustering; Cluster evaluation; EFFICIENT;

D O I：

10.1007/s12626-021-00084-7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent exponential expansion of users adopting to applications on the mobile internet, like e-commerce and social networks, warrants mining of the huge data collected from users' past actions, for improving businesses and services. The core step for mining is to cluster the data meaningfully, conforming to the application. Social network data are structured, and graphical presentation reveals that structure. Therefore, graph clustering is an effective way to divulge the underlying structure in the data. For clustering, calculating similarity between a pair of vectors is the first step. The large dimension of the data, which is often noisy and sparse, makes distance measurement hard. In high dimension, most of the conventional distance metrics fail to work, as the data points are distributed over the surface of the high-dimensional hyper-space. The traditional concept of similarity, and nearest-neighbor does not hold. The variance of distance between any pair of points shrinks as the dimension increases. In this work, we investigate the efficacy of various similarity measures and clustering algorithms on high dimensional data. We experimented with a real-world high-dimensional matrix data, the ratings of movies by users. Clustering of movie items depends on a number of factors like movie genre, actors, directors, prominent acclaimed movie or an obscure one, etc. Different similarity measurements and clustering algorithms were experimented. Clustering results were evaluated by matching with known annotations of the movies. Finally, we proposed a novel recommendation algorithm based on item clustering. Its performance was evaluated with different distance metrics and clustering algorithms. Methods elaborated are applicable to other structured data generated in social network applications, or in biological investigations.

引用

页码：327 / 346

页数：20

共 46 条

[1]

Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420

[2]

Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027

[3] A HIERARCHICAL O(N-LOG-N) FORCE-CALCULATION ALGORITHM [J].