Big Data: from collection to visualization

被引:0
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [31] Efficient Collection of Big data in WSN
    Halde, Sarita V.
    Khot, S. T.
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 1, 2016, : 423 - 427
  • [32] PyramidViz: Visual Analytics and Big Data Visualization of Frequent Patterns
    Leung, Carson K.
    Kononov, Vadim V.
    Pazdor, Adam G. M.
    Jiang, Fan
    2016 IEEE 14TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 14TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 2ND INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/DATACOM/CYBERSC, 2016, : 913 - 916
  • [33] Beyond visualization of big data: a multi-stage data exploration approach using visualization, sonification, and storification
    Rimland, Jeffrey
    Ballora, Mark
    Shumaker, Wade
    NEXT-GENERATION ANALYST, 2013, 8758
  • [34] Cornac: Tackling Huge Graph Visualization with Big Data Infrastructure
    Perrot, Alexandre
    Auber, David
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (01) : 80 - 92
  • [35] Web-Based Visualization of Big Geospatial Vector Data
    Zouhar, Florian
    Senner, Ivo
    GEOSPATIAL TECHNOLOGIES FOR LOCAL AND REGIONAL DEVELOPMENT, 2020, : 59 - 74
  • [36] Discriminant component analysis for privacy protection and visualization of big data
    Kung, Sun-Yuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 3999 - 4034
  • [37] Discriminant component analysis for privacy protection and visualization of big data
    Sun-Yuan Kung
    Multimedia Tools and Applications, 2017, 76 : 3999 - 4034
  • [38] Fog Computing Capabilities for Big Data Provisioning: Visualization Scenario
    Khujamatov, Halimjon
    Ahmad, Khaleel
    Usmanova, Nargiza
    Khoshimov, Jamshid
    Alduailij, Mai
    Alduailij, Mona
    SUSTAINABILITY, 2022, 14 (13)
  • [39] An Algorithm for Visualization of Big Data in a Two-Dimensional Space
    Wu, Bo
    Wilamowski, B. M.
    IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 53 - 58
  • [40] Event graph based contradiction recognition from big data collection
    Liu, Maofu
    Wang, Limin
    Nie, Liqiang
    Dai, Jianhua
    Ji, Donghong
    NEUROCOMPUTING, 2016, 181 : 64 - 75