A Relevance-based approach for Big Data Exploration

被引:23
作者
Bagozi, Ada [1 ]
Bianchini, Devis [1 ]
De Antonellis, Valeria [1 ]
Garda, Massimiliano [1 ]
Marini, Alessandro [1 ]
机构
[1] Univ Brescia, Dept Informat Engn, Via Branze 38, I-25123 Brescia, Italy
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2019年 / 101卷
关键词
Data exploration; Big data; Multi-dimensional data modelling; Human-In-the-Loop Data Analysis; Industry; 4.0; Cyber Physical Systems;
D O I
10.1016/j.future.2019.05.056
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The collection, organisation and analysis of large amount of data (Big Data) in different application domains still require the involvement of experts for the identification of relevant data only, without being overwhelmed by volume, velocity and variety of collected data. According to the "Human-In-the-Loop Data Analysis" vision, experts explore data to take decisions in unexpected situations, based on their long-term experience. In this paper, the IDEAaS (Interactive Data Exploration As-a-Service) approach is presented, apt to enable Big Data Exploration (BDE) according to data relevance. In the approach, novel techniques have been developed: (i) an incremental clustering algorithm, to provide summarised representation of collected data streams; (ii) multi-dimensional organisation of summarised data, for data exploration according to different analysis dimensions; (iii) data relevance evaluation techniques, to attract the experts attention on relevant data only during exploration. The approach has been experimented to apply BDE for state detection in the Industry 4.0 domain, given the strategic importance of Big Data management as enabling technology in this field. In particular, a stream of numeric features is collected from a Cyber Physical System and is explored to monitor the system health status, supporting the identification of unknown anomalous conditions. Results of an extensive experimentation in the Industry 4.0 domain are presented in the paper and demonstrated the effectiveness of developed techniques to attract the attention of experts on relevant data, also beyond the considered domain, in presence of disruptive characteristics of Big Data, namely volume (millions of collected records), velocity (measured in milliseconds) and variety (number and heterogeneity of analysis dimensions). (C) 2019 Published by Elsevier B.V.
引用
收藏
页码:51 / 69
页数:19
相关论文
共 29 条
[1]  
[Anonymous], 2003, P 29 INT C VER LARG
[2]  
[Anonymous], 2015, ZVEI
[3]  
[Anonymous], 2011, Data mining: concepts and techniques
[4]  
[Anonymous], 2016, INT J SUPPLY CHAIN M
[5]   Summarisation and Relevance Evaluation Techniques for Big Data Exploration: The Smart Factory Case Study [J].
Bagozi, Ada ;
Bianchini, Devis ;
De Antonellis, Valeria ;
Marini, Alessandro ;
Ragazzi, Davide .
ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2017), 2017, 10253 :264-279
[6]   A novel Big Data analytics and intelligent technique to predict driver's intent [J].
Birek, Lech ;
Grzywaczewski, Adam ;
Iqbal, Rahat ;
Doctor, Faiyaz ;
Chang, Victor .
COMPUTERS IN INDUSTRY, 2018, 99 :226-240
[7]   Database Challenges for Exploratory Computing [J].
Buoncristiano, Marcello ;
Mecca, Giansalvatore ;
Quintarelli, Elisa ;
Roveri, Manuel ;
Santoro, Donatello ;
Tanca, Letizia .
SIGMOD RECORD, 2015, 44 (02) :17-22
[8]   Towards data analysis for weather cloud computing [J].
Chang, Victor .
KNOWLEDGE-BASED SYSTEMS, 2017, 127 :29-45
[9]   Decaying Telco Big Data with Data Postdiction [J].
Costa, Constantinos ;
Charalampous, Andreas ;
Konstantinidis, Andreas ;
Zeinalipour-Yazti, Demetrios ;
Mokbel, Mohamed F. .
2018 19TH IEEE INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2018), 2018, :106-115
[10]   AIDE: An Active Learning-Based Approach for Interactive Data Exploration [J].
Dimitriadou, Kyriaki ;
Papaemmanouil, Olga ;
Diao, Yanlei .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (11) :2842-2856