Comprehensive Analysis of Various Big Data Classification Techniques: A Challenging Overview

被引：4

作者：

Abdalla, Hemn Barzan ^{[1
]}

Abuhaija, Belal ^{[1
]}

机构：

[1] Wenzhou Kean Univ, Dept Comp Sci, Wenzhou, Peoples R China

来源：

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT | 2023年 / 22卷 / 01期

关键词：

Data mining; big data; semantic similarity measures; support vector machine; K-nearest neighbor; MAPREDUCE FRAMEWORK; ALGORITHM;

D O I：

10.1142/S0219649222500836

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

Data over the internet has been increasing everyday, and automatic mining of essential information from an enormous amount of data has become a challenging task today for an organisation with a huge dataset. In recent years, the prominent technology in the domain of Information Technology (IT) is big data, which is unstructured data that solves the computational complexity of classical database systems. The data is fast and big and typically derived from multiple and independent sources. The three main challenges are data accessing, semantics, and domain knowledge for various big data utilisations and complexities raised by big data volumes. One of the major limitations is the classification of big data. This paper introduces well-defined classification methodologies employed for big data classification. This paper reviews 50 research papers based on classification methods of big data, and such methodologies are primarily categorised into six different categories, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Fuzzy-based method, Bayesian-based method, Random Forest, and Decision Tree. In addition, detailed analysis and discussion are carried out by considering classification techniques, dataset utilised, evaluation metrics, semantic similarity measures, and publication year. In addition, research gaps and issues for several traditional big data classification techniques are explained to expand investigators' works to provide effective big data management.

引用

页数：22

共 53 条

[11] A hybrid multi-objective firefly and simulated annealing based algorithm for big data classification [J].

Devi, S. Gayathri ;

Sabrigiriraj, M. .

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14)

[12]

Diego CF, 2017, 2017 IEEE ELECTRICAL INSULATION CONFERENCE (EIC), P1, DOI 10.1109/EIC.2017.8004657

[13]

El M, 2015, International Journal of Computer Applications, V132, P8, DOI 10.5120/ijca2015907591

[14] CFM-BD: A Distributed Rule Induction Algorithm for Building Compact Fuzzy Models in Big Data Classification Problems [J].

Elkano, Mikel ;

Antonio Sanz, Jose ;

Barrenechea, Edurne ;

Bustince, Humberto ;

Galar, Mikel .

IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (01) :163-177

[15] CHI-BD: A fuzzy rule-based classification system for Big Data classification problems [J].

Elkano, Mikel ;

Galar, Mikel ;

Sanz, Jose ;

Bustince, Humberto .

FUZZY SETS AND SYSTEMS, 2018, 348 :75-101

[16]

Fernández A, 2016, IEEE INT FUZZY SYST, P1437, DOI 10.1109/FUZZ-IEEE.2016.7737858

[17] Optimized Decision tree rules using divergence based grey wolf optimization for big data classification in health care [J].

Game, Pravin S. ;

Vaze, Vinod ;

Emmanuel, M. .

EVOLUTIONARY INTELLIGENCE, 2022, 15 (02) :971-987

[18]

GarciaGil D., 2020, ARXIV

[19] Big Data Classification Using Scale-Free Binary Particle Swarm Optimization [J].

Gupta, Sonu Lal ;

Baghel, Anurag Singh ;

Iqbal, Asif .

HARMONY SEARCH AND NATURE INSPIRED OPTIMIZATION ALGORITHMS, 2019, 741 :1177-1187

[20]

Hassanat ABA, 2018, PLOS ONE, V13, DOI [10.1371/journal.pone.0207772, 10.1371/journaL.pone.0207772]

← 1 2 3 4 5 6 →