Big Data: from collection to visualization

被引:0
|
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [1] Big Data: from collection to visualization
    Ghesmoune, Mohammed
    Azzag, Hanene
    Benbernou, Salima
    Lebbah, Mustapha
    Duong, Tarn
    Ouziri, Mourad
    MACHINE LEARNING, 2017, 106 (06) : 837 - 862
  • [2] Visualization of Big Data
    Kung, Sun-Yuan
    PROCEEDINGS OF 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2015, : 447 - 448
  • [3] Big-Data Visualization
    Keim, Daniel
    Qu, Huamin
    Ma, Kwan-Liu
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) : 20 - 21
  • [4] Efficacy of Bluetooth-Based Data Collection for Road Traffic Analysis and Visualization Using Big Data Analytics
    Kulkarni, Ashish Rajeshwar
    Kumar, Narendra
    Rao, K. Ramachandra
    BIG DATA MINING AND ANALYTICS, 2023, 6 (02) : 139 - 153
  • [5] Topic Modeling and Visualization for Big Data in Social Sciences
    Sukhija, Nitin
    Tatineni, Mahidhar
    Brown, Nicole
    Van Moer, Mark
    Rodriguez, Paul
    Callicott, Spencer
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 1198 - 1205
  • [6] Visualization and Visual Knowledge Discovery from Big Uncertain Data
    Leung, Carson K.
    Madill, Evan W. R.
    Pazdor, Adam
    2022 26TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION (IV), 2022, : 330 - 335
  • [7] Research on Data Visualization Based on Big Data
    Xu, Shasha
    Zheng, Kouquan
    Yang, Wenjing
    Sun, Yanming
    2019 4TH INTERNATIONAL WORKSHOP ON MATERIALS ENGINEERING AND COMPUTER SCIENCES (IWMECS 2019), 2019, : 281 - 285
  • [8] Big Data Visualization: Tools and Challenges
    Ali, Syed Mohd
    Gupta, Noopur
    Nayak, Gopal Krishna
    Lenka, Rakesh Kumar
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 656 - 660
  • [9] Collection, Analysis and Interactive Visualization of NetFlow Data: Experience with Big Data on the Base of the National Research Computer Network of Russia
    Abramov, A. G.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2020, 41 (12) : 2525 - 2534
  • [10] Big Data Provenance Analysis and Visualization
    Chen, Peng
    Plale, Beth
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 797 - 800