Fuzzy rule based classification systems for big data with MapReduce: granularity analysis

被引：35

作者：

Fernandez, Alberto ^{[1
]}

del Rio, Sara ^{[1
]}

Bawakid, Abdullah ^{[2
]}

Herrera, Francisco ^{[1
,2
]}

机构：

[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain

[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia

来源：

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION | 2017年 / 11卷 / 04期

关键词：

Big data; Fuzzy rule based classification systems; Granularity; MapReduce; Hadoop; DATA SCIENCE; CHALLENGES; PROPOSAL;

D O I：

10.1007/s11634-016-0260-z

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Due to the vast amount of information available nowadays, and the advantages related to the processing of this data, the topics of big data and data science have acquired a great importance in the current research. Big data applications are mainly about scalability, which can be achieved via the MapReduce programming model.It is designed to divide the data into several chunks or groups that are processed in parallel, and whose result is "assembled" to provide a single solution. Among different classification paradigms adapted to this new framework, fuzzy rule based classification systems have shown interesting results with a MapReduce approach for big data. It is well known that the performance of these types of systems has a strong dependence on the selection of a good granularity level for the Data Base. However, in the context of MapReduce this parameter is even harder to determine as it can be also related with the number of Maps chosen for the processing stage. In this paper, we aim at analyzing the interrelation between the number of labels of the fuzzy variables and the scarcity of the data due to the data sampling in MapReduce. Specifically, we consider that as the partitioning of the initial instance set grows, the level of granularity necessary to achieve a good performance also becomes higher. The experimental results, carried out for several Big Data problems, and using the Chi-FRBCS-BigData algorithms, support our claims.

引用

页码：711 / 730

页数：20

共 35 条

[1]

[Anonymous], 2013, Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

[2] Data-intensive applications, challenges, techniques and technologies: A survey on Big Data [J].

Chen, C. L. Philip ;

Zhang, Chun-Yang .

INFORMATION SCIENCES, 2014, 275 :314-347

[3]

Chi Z., 1996, FUZZY ALGORITHMS APP

[4] A proposal for improving the accuracy of linguistic modeling [J].

Cordón, O ;

Herrera, F .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2000, 8 (03) :335-344

[5] Analysis and guidelines to obtain a good uniform fuzzy partition granularity for fuzzy rule-based systems using simulated annealing [J].

Cordón, O ;

Herrera, F ;

Villar, P .

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2000, 25 (03) :187-215

[6] A proposal on reasoning methods in fuzzy rule-based classification systems [J].

Cordón, O ;

del Jesus, MJ ;

Herrera, F .

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 1999, 20 (01) :21-45

[7] Mapreduce: Simplified data processing on large clusters [J].

Dean, Jeffrey ;

Ghemawat, Sanjay .

COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113

[8] MapReduce: A Flexible Data Processing Tool [J].

Dean, Jeffrey ;

Ghemawat, Sanjay .

COMMUNICATIONS OF THE ACM, 2010, 53 (01) :72-77

[9] A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules [J].

del Rio, Sara ;

Lopez, Victoria ;

Manuel Benitez, Jose ;

Herrera, Francisco .

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 (03) :422-437

[10] Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks [J].

Fernandez, Alberto ;

del Rio, Sara ;

Lopez, Victoria ;

Bawakid, Abdullah ;

del Jesus, Maria J. ;

Benitez, Jose M. ;

Herrera, Francisco .

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (05) :380-409

← 1 2 3 4 →