Unsupervised strategies for shilling detection and robust collaborative filtering

被引:109
作者
Mehta, Bhaskar [1 ]
Nejdl, Wolfgang [2 ]
机构
[1] Google Inc, CH-8004 Zurich, Switzerland
[2] Leibniz Univ Hannover, Forschungszentrum L3S, D-30167 Hannover, Germany
关键词
Shilling; Collaborative filtering; Dimensionality reduction; PCA; PLSA; Robust statistics;
D O I
10.1007/s11257-008-9050-4
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Collaborative filtering systems are essentially social systems which base their recommendation on the judgment of a large number of people. However, like other social systems, they are also vulnerable to manipulation by malicious social elements. Lies and Propaganda may be spread by a malicious user who may have an interest in promoting an item, or downplaying the popularity of another one. By doing this systematically, with either multiple identities, or by involving more people, malicious user votes and profiles can be injected into a collaborative recommender system. This can significantly affect the robustness of a system or algorithm, as has been studied in previous work. While current detection algorithms are able to use certain characteristics of shilling profiles to detect them, they suffer from low precision, and require a large amount of training data. In this work, we provide an in-depth analysis of shilling profiles and describe new approaches to detect malicious collaborative filtering profiles. In particular, we exploit the similarity structure in shilling user profiles to separate them from normal user profiles using unsupervised dimensionality reduction. We present two detection algorithms; one based on PCA, while the other uses PLSA. Experimental results show a much improved detection precision over existing methods without the usage of additional training time required for supervised approaches. Finally, we present a novel and highly effective robust collaborative filtering algorithm which uses ideas presented in the detection algorithms using principal component analysis.
引用
收藏
页码:65 / 97
页数:33
相关论文
共 24 条
  • [1] Variable selection and interpretation in correlation principal components
    Al-Kandari, NM
    Jolliffe, IT
    [J]. ENVIRONMETRICS, 2005, 16 (06) : 659 - 672
  • [2] [Anonymous], P 11 INT C INT US IN
  • [3] [Anonymous], MOVIELENS DATASET
  • [4] [Anonymous], 2000, Genome Biol.
  • [5] Bennett J., 2007, SIGKDD Explor. Newsl., V9, P51
  • [6] Brand M, 2003, SIAM PROC S, P37
  • [7] Burke Robin, 2006, P 12 ACM SIGKDD INT, P542, DOI DOI 10.1145/1150402.1150465
  • [8] Canny J., 2002, Proceedings of SIGIR 2002. Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P238, DOI 10.1145/564376.564419
  • [9] CHIRITA PA, 2005, WIDM 05, P67
  • [10] Gorrell G., 2006, EACL, V6, P97