A Structured Approach Towards Big Data Identification

被引：2

作者：

Ahmed, Hameeza ^{[1
]}

Ismail, Muhammad Ali ^{[1
]}

机构：

[1] NED Univ Engn & Technol, Dept Comp & Informat Syst Engn, Karachi 75270, Sindh, Pakistan

来源：

IEEE TRANSACTIONS ON BIG DATA | 2023年 / 9卷 / 01期

关键词：

Big Data; Hardware; Complexity theory; Real-time systems; Personnel; Optimization; Mathematical models; Big data; identification; 3Vs; offloading; mathematical equations; DATA ANALYTICS; BENCHMARK SUITE; DATA CHALLENGES; MAPREDUCE; INTERNET; IOT; FRAMEWORK; SYSTEMS; THINGS; TECHNOLOGIES;

D O I：

10.1109/TBDATA.2021.3139069

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big data is a "relative " concept. It is the combination of data, application, and platform properties. The term big data has been used with almost every problem involving large size, real time, and heterogeneous data. However, these data attributes are not enough to identify big data by ignoring the application and platform properties for finding processing thresholds. The equivocated identification of big data can lead to an inefficient use of optimization techniques, resulting into global inefficiency, reduced system performance, increasing power consumption, requiring greater effort on the part of the programming team, and misallocation of the hardware resources required for the task. In this regard, a structured approach has been presented for identification of big data. The approach is based on three equations that categorize the Volume, Velocity, and Variety characteristics by relating data, application, and platform properties. The 3Vs identification is necessary for enabling the relevant optimization techniques. In addition to 3Vs identification, it is required to discriminate whether the big data is due to 1V, 2Vs or 3Vs, as the involvement of more Vs increases the problem complexity. In this regard, the classification of big data into strong, moderate or weak level has been proposed . To evaluate the proposed methods, a set of well-known applications have been experimented and categorized, depicting a saving of up to 58% main memory and 44% disk reads, as well as prescribing lower clock rate, lesser cores, sequential programming, and non adaptive processing & storage formats. Moreover, four case studies reported as big data have been analyzed according to the proposed system. The proposed method is able to categorize two case studies as weak low big data presenting only volume, the third case is weak medium due to velocity, whereas in the fourth case no V is involved. Also, the proposed equations reduce the computation and human resources up to 75% of Spark cluster execution. In this manner, the proposed work can save the unnecessary investments by relevant prescriptions. Furthermore, the proposed equations can be integrated into different tools for assisting selective offloading of big data workloads to appropriate software and hardware solutions.

引用

页码：147 / 159

页数：13

共 50 条

[31] Towards a social graph approach for modeling risks in Big Data and Internet of Things (IoT)
Johnny, Olayinka
Sotiriadis, Stelios
Asimakopoulou, Eleana
Bessis, Nik
2014 INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS), 2014, : 439 - 444
[32] Sustainable robust layout using Big Data approach: A key towards industry 4.0
Kumar, Ravi
Singh, Surya Prakash
Lamba, Kuldeep
JOURNAL OF CLEANER PRODUCTION, 2018, 204 : 643 - 659
[33] Incorporating big data and IoT in intelligent ecosystems: state-of-the-arts, challenges and opportunities, and future directions
Saeed, Nimra
Malik, Hassaan
Naeem, Ahmad
Bashir, Umair
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 20699 - 20741
[34] The big data system, components, tools, and technologies: a survey
Rao, T. Ramalingeswara
Mitra, Pabitra
Bhatt, Ravindara
Goswami, A.
KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (03) : 1165 - 1245
[35] A Transformation Approach Towards Big Data Multilabel Decision Trees
Rivera Rivas, Antonio Jesus
Charte Ojeda, Francisco
Javier Pulgar, Francisco
Jose del Jesus, Maria
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT I, 2017, 10305 : 73 - 84
[36] Content analysis of literature on big data in smart cities
Tiwari, Pulkit
Ilavarasan, P. Vigneswara
Punia, Sushil
BENCHMARKING-AN INTERNATIONAL JOURNAL, 2021, 28 (05) : 1837 - 1857
[37] Big data monetization throughout Big Data Value Chain: a comprehensive review
Faroukhi, Abou Zakaria
El Alaoui, Imane
Gahi, Youssef
Amine, Aouatif
JOURNAL OF BIG DATA, 2020, 7 (01)
[38] Big Data Value Proposition in UK Facilities Management: A Structural Equation Modelling Approach
Konanahalli, Ashwini
Marinelli, Marina
Oyedele, Lukumon
BUILDINGS, 2024, 14 (07)
[39] Real-time big data processing for instantaneous marketing decisions: A problematization approach
Jabbar, Abdul
Akhtar, Pervaiz
Dani, Samir
INDUSTRIAL MARKETING MANAGEMENT, 2020, 90 : 558 - 569
[40] Scalable fuzzy multivariate outliers identification towards big data applications
Touny, Huda Mohammed
Moussa, Ahmed Shawky
Hadi, Ali S.
APPLIED SOFT COMPUTING, 2024, 155

← 1 2 3 4 5 →