FSCOALParallel simultaneous fuzzy co-clustering and learning

被引：0

作者：

Biton, David ^{[1
]}

Kalech, Meir ^{[1
]}

Rokach, Lior ^{[1
]}

机构：

[1] Ben Gurion Univ Negev, Software & Informat Syst Engn Dept, Beer Sheva, Israel

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | 2018年 / 33卷 / 07期

关键词：

distributed data mining; fuzzy co-clustering; predictive modeling;

D O I：

10.1002/int.21967

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A model-based co-clustering divides the data based on two main axes and simultaneously trains a supervised model for each co-cluster using all other input features. For example, in the rating prediction task of recommender system, the main two axes are items and users. In each co-cluster, we train a regression model for predicting the rating based on other features such as user's characteristics (e.g., gender), item's characteristics (e.g., genre), contextual features (e.g., location), and so on. In reality, users and items do not necessarily belong to a single co-cluster, but rather can be associated with several co-clusters. We extend the model-based co-clustering to support fuzzy co-clustering. In this setting, each item-user pair is associated to every co-cluster with some membership grade. This grade indicates the level of relevance of the item-user pair to the co-cluster. Furthermore, we propose a distributed algorithm, based on a map-reduce approach, to handle big datasets. Evaluating the fuzzy co-clustering algorithm on three datasets shows a significant improvement comparing with a regular co-clustering algorithm. In addition, a map-reduce version of the fuzzy co-clustering algorithm significantly reduces the runtime.

引用

页码：1364 / 1380

页数：17

共 26 条

[1] [Anonymous], 2005, 5 IEEE INT C DAT MIN
[2] [Anonymous], 2003, Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining
[3] [Anonymous], LECT NOTES COMPUT SC
[4] [Anonymous], P ACM SIGKDD INT C K
[5] Basilico J, 2011, IEEE 11 INT C DAT MI, P41
[6] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[7] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
[8] Mapreduce: Simplified data processing on large clusters
Dean, Jeffrey
Ghemawat, Sanjay
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
[9] Deodhar Meghana, 2010, Proceedings of the 2010 IEEE International Conference on Granular Computing (GrC-2010), P149, DOI 10.1109/GrC.2010.54
[10] Deodhar M, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P250

← 1 2 3 →