FSCOALParallel simultaneous fuzzy co-clustering and learning

被引:0
作者
Biton, David [1 ]
Kalech, Meir [1 ]
Rokach, Lior [1 ]
机构
[1] Ben Gurion Univ Negev, Software & Informat Syst Engn Dept, Beer Sheva, Israel
关键词
distributed data mining; fuzzy co-clustering; predictive modeling;
D O I
10.1002/int.21967
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A model-based co-clustering divides the data based on two main axes and simultaneously trains a supervised model for each co-cluster using all other input features. For example, in the rating prediction task of recommender system, the main two axes are items and users. In each co-cluster, we train a regression model for predicting the rating based on other features such as user's characteristics (e.g., gender), item's characteristics (e.g., genre), contextual features (e.g., location), and so on. In reality, users and items do not necessarily belong to a single co-cluster, but rather can be associated with several co-clusters. We extend the model-based co-clustering to support fuzzy co-clustering. In this setting, each item-user pair is associated to every co-cluster with some membership grade. This grade indicates the level of relevance of the item-user pair to the co-cluster. Furthermore, we propose a distributed algorithm, based on a map-reduce approach, to handle big datasets. Evaluating the fuzzy co-clustering algorithm on three datasets shows a significant improvement comparing with a regular co-clustering algorithm. In addition, a map-reduce version of the fuzzy co-clustering algorithm significantly reduces the runtime.
引用
收藏
页码:1364 / 1380
页数:17
相关论文
共 26 条
  • [1] [Anonymous], 2005, 5 IEEE INT C DAT MIN
  • [2] [Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
  • [3] [Anonymous], LECT NOTES COMPUT SC
  • [4] [Anonymous], P ACM SIGKDD INT C K
  • [5] Basilico J, 2011, IEEE 11 INT C DAT MI, P41
  • [6] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [7] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
  • [8] Mapreduce: Simplified data processing on large clusters
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
  • [9] Deodhar Meghana, 2010, Proceedings of the 2010 IEEE International Conference on Granular Computing (GrC-2010), P149, DOI 10.1109/GrC.2010.54
  • [10] Deodhar M, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P250