A Structured Approach Towards Big Data Identification

被引:2
|
作者
Ahmed, Hameeza [1 ]
Ismail, Muhammad Ali [1 ]
机构
[1] NED Univ Engn & Technol, Dept Comp & Informat Syst Engn, Karachi 75270, Sindh, Pakistan
关键词
Big Data; Hardware; Complexity theory; Real-time systems; Personnel; Optimization; Mathematical models; Big data; identification; 3Vs; offloading; mathematical equations; DATA ANALYTICS; BENCHMARK SUITE; DATA CHALLENGES; MAPREDUCE; INTERNET; IOT; FRAMEWORK; SYSTEMS; THINGS; TECHNOLOGIES;
D O I
10.1109/TBDATA.2021.3139069
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data is a "relative " concept. It is the combination of data, application, and platform properties. The term big data has been used with almost every problem involving large size, real time, and heterogeneous data. However, these data attributes are not enough to identify big data by ignoring the application and platform properties for finding processing thresholds. The equivocated identification of big data can lead to an inefficient use of optimization techniques, resulting into global inefficiency, reduced system performance, increasing power consumption, requiring greater effort on the part of the programming team, and misallocation of the hardware resources required for the task. In this regard, a structured approach has been presented for identification of big data. The approach is based on three equations that categorize the Volume, Velocity, and Variety characteristics by relating data, application, and platform properties. The 3Vs identification is necessary for enabling the relevant optimization techniques. In addition to 3Vs identification, it is required to discriminate whether the big data is due to 1V, 2Vs or 3Vs, as the involvement of more Vs increases the problem complexity. In this regard, the classification of big data into strong, moderate or weak level has been proposed . To evaluate the proposed methods, a set of well-known applications have been experimented and categorized, depicting a saving of up to 58% main memory and 44% disk reads, as well as prescribing lower clock rate, lesser cores, sequential programming, and non adaptive processing & storage formats. Moreover, four case studies reported as big data have been analyzed according to the proposed system. The proposed method is able to categorize two case studies as weak low big data presenting only volume, the third case is weak medium due to velocity, whereas in the fourth case no V is involved. Also, the proposed equations reduce the computation and human resources up to 75% of Spark cluster execution. In this manner, the proposed work can save the unnecessary investments by relevant prescriptions. Furthermore, the proposed equations can be integrated into different tools for assisting selective offloading of big data workloads to appropriate software and hardware solutions.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 50 条
  • [21] Big Sensor Data Systems for Smart Cities
    Ang, Li-Minn
    Seng, Kah Phooi
    Zungeru, Adamu Murtala
    Ijemaru, Gerald K.
    IEEE INTERNET OF THINGS JOURNAL, 2017, 4 (05): : 1259 - 1271
  • [22] Significant Applications of Big Data in Industry 4.0
    Javaid, Mohd
    Haleem, Abid
    Singh, Ravi Pratap
    Suman, Rajiv
    JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT-INNOVATION AND ENTREPRENEURSHIP, 2021, 06 (04) : 429 - 447
  • [23] A survey of machine learning for big data processing
    Qiu, Junfei
    Wu, Qihui
    Ding, Guoru
    Xu, Yuhua
    Feng, Shuo
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
  • [24] Big data and machine learning: A roadmap towards smart plants
    Dorneanu, Bogdan
    Zhang, Sushen
    Ruan, Hang
    Heshmat, Mohamed
    Chen, Ruijuan
    Vassiliadis, Vassilios S.
    Arellano-Garcia, Harvey
    FRONTIERS OF ENGINEERING MANAGEMENT, 2022, 9 (04) : 623 - 639
  • [25] Towards a Conceptual Framework for Customer Intelligence in the Era of Big Data
    Nguyen Anh Khoa Dam
    Thang Le Dinh
    Menvielle, William
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2021, 17 (04)
  • [26] Towards Efficient Big Data and Data Analytics: A Review
    Qureshi, Salim Raza
    Gupta, Ankur
    2014 CONFERENCE ON IT IN BUSINESS, INDUSTRY AND GOVERNMENT (CSIBIG), 2014,
  • [27] Simulation of Internet of Things Network for Big Data Analytics
    Manujakshi, B. C.
    Ramesh, K. B.
    Garg, Lalit
    Shashidhar, T. M.
    INFORMATION SYSTEMS AND MANAGEMENT SCIENCE, ISMS 2021, 2023, 521 : 37 - 48
  • [28] ThumpStorage : A Management and Analysis System for Structured Big Data
    Xu Tao
    Zhang Hong
    Fu Ge
    Liu Xinran
    Tan Huaiyuan
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2424 - 2427
  • [29] Factors Affecting the Utilization of Big Data in Construction Projects
    Yu, Tao
    Liang, Xin
    Wang, Yaowu
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2020, 146 (05)
  • [30] Towards MapReduce Approach with Dynamic Fuzzy Inference/Interpolation for Big Data Classification Problems
    Jin, Shangzhu
    Peng, Jun
    Xie, Dong
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 407 - 413