An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

被引:16
作者
Ducange, Pietro [1 ]
Fazzolari, Michela [2 ]
Marcelloni, Francesco [1 ]
机构
[1] Dipartimento Ingn Informaz, Largo Lucio Lazzarino 1, I-56122 Pisa, Italy
[2] CNR, Ist Informat & Telemat, Via Giuseppe Moruzzi 1, I-56124 Pisa, Italy
关键词
Big Data; Fuzzy models; Data mining; Classification algorithms; Distributed computing; MULTIOBJECTIVE EVOLUTIONARY APPROACH; ASSOCIATIVE CLASSIFICATION; CLUSTERING-ALGORITHM; SYSTEMS; MAPREDUCE; ANALYTICS; DESIGN; GRANULARITY; CLASSIFIERS; SELECTION;
D O I
10.1186/s40537-020-00298-6
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability.
引用
收藏
页数:29
相关论文
共 50 条
[41]   Fuzzy Neighbors and Deep Learning-Assisted Spark Model for Imbalanced Classification of Big Data [J].
Nalinipriya, G. ;
Geetha, M. ;
Sudha, D. ;
Daniya, T. .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2023, 31 (01) :141-162
[42]   Spark Based Distributed Deep Learning Framework For Big Data Applications [J].
Khumoyun, Akhmedov ;
Cui, Yun ;
Hanku, Lee .
2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
[43]   A New MapReduce Approach with Dynamic Fuzzy Inference for Big Data Classification Problems [J].
Jin, Shangzhu ;
Peng, Jun ;
Xie, Dong .
INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2018, 12 (03) :40-54
[44]   Fuzzy integral-based ELM ensemble for imbalanced big data classification [J].
Zhai, Junhai ;
Zhang, Sufang ;
Zhang, Mingyang ;
Liu, Xiaomeng .
SOFT COMPUTING, 2018, 22 (11) :3519-3531
[45]   An overview on the roles of fuzzy set techniques in big data processing: Trends, challenges and opportunities [J].
Wang, Hai ;
Xu, Zeshui ;
Pedrycz, Witold .
KNOWLEDGE-BASED SYSTEMS, 2017, 118 :15-30
[46]   Distributed Fuzzy Rough Prototype Selection for Big Data Regression [J].
Vluymans, Sarah ;
Asfoor, Hasan ;
Saeys, Yvan ;
Cornelis, Chris ;
Tolentino, Matthew ;
Teredesai, Ankur ;
De Cock, Martine .
2015 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY DIGIPEN NAFIPS 2015, 2015,
[47]   Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models [J].
Nasim Ahmed ;
Andre L. C. Barczak ;
Mohammad A. Rashid ;
Teo Susnjak .
Journal of Big Data, 9
[48]   Runtime prediction of big data jobs: performance comparison of machine learning algorithms and analytical models [J].
Ahmed, Nasim ;
Barczak, Andre L. C. ;
Rashid, Mohammad A. ;
Susnjak, Teo .
JOURNAL OF BIG DATA, 2022, 9 (01)
[49]   A Study of Recent Classification Algorithms and a Novel Approach for EEG Data Classification [J].
Cinar, Eyup ;
Sahin, Ferat .
IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, :3366-3372
[50]   CHI-BD: A fuzzy rule-based classification system for Big Data classification problems [J].
Elkano, Mikel ;
Galar, Mikel ;
Sanz, Jose ;
Bustince, Humberto .
FUZZY SETS AND SYSTEMS, 2018, 348 :75-101