Big Data: from collection to visualization

被引:0
|
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [21] Financial Big data Visualization: A Machine Learning Perspective
    Dong, Alice Xiaodan
    Huang, Weidong
    Wang, Jitong
    17TH INTERNATIONAL SYMPOSIUM ON VISUAL INFORMATION COMMUNICATION AND INTERACTION, VINCI 2024, 2024,
  • [22] Investigation into the efficacy of geospatial big data visualization tools
    Barik, Rabindra K.
    Lenka, Rakesh K.
    Ali, Syed Mohd
    Gupta, Noopur
    Satpathy, Ananya
    Raj, Ankit
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 88 - 92
  • [23] Spatial-Crowd: A Big Data Framework for Efficient Data Visualization
    Atta, Shahbaz
    Sadiq, Bilal
    Ahmad, Akhlaq
    Saeed, Sheikh Nasir
    Felemban, Emad
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2130 - 2138
  • [24] Progressive Clustering of Big Data with GPU Acceleration and Visualization
    Wang, Jun
    Papenhausen, Eric
    Wang, Bing
    Ha, Sungsoo
    Zelenyuk, Alla
    Mueller, Klaus
    2017 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2017,
  • [25] A Study on Garbage Collection Algorithms for Big Data Environments
    Bruno, Rodrigo
    Ferreira, Paulo
    ACM COMPUTING SURVEYS, 2018, 51 (01)
  • [26] Big Data Visualization and Visual Analytics of COVID-19 Data
    Leung, Carson K.
    Chen, Yubo
    Hoi, Calvin S. H.
    Shang, Siyuan
    Wen, Yan
    Cuzzocrea, Alfredo
    2020 24TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV 2020), 2020, : 415 - 420
  • [27] Big network traffic data visualization
    Zichan Ruan
    Yuantian Miao
    Lei Pan
    Yang Xiang
    Jun Zhang
    Multimedia Tools and Applications, 2018, 77 : 11459 - 11487
  • [28] Big network traffic data visualization
    Ruan, Zichan
    Miao, Yuantian
    Pan, Lei
    Xiang, Yang
    Zhang, Jun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (09) : 11459 - 11487
  • [29] Big Data Analysis and Services: Visualization of Smart Data to Support Healthcare Analytics
    Leung, Carson K.
    Zhang, Yibin
    Hoi, Calvin S. H.
    Souza, Joglas
    Wodi, Bryan H.
    2019 INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2019, : 1261 - 1268
  • [30] Visualization communication mode and path optimization of data news in the context of big data
    Zhang H.
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01)