Big Data: from collection to visualization

被引:0
作者
Mohammed Ghesmoune
Hanene Azzag
Salima Benbernou
Mustapha Lebbah
Tarn Duong
Mourad Ouziri
机构
[1] University of Paris 13,LIPN
[2] Sorbonne Paris City,UMR 7030
[3] University of Paris Descartes, CNRS
[4] Sorbonne Paris City,LIPADE
来源
Machine Learning | 2017年 / 106卷
关键词
Data fusion; RDF; Semantic; Entity resolution; Big data; Map-Reduce; Spark; Data stream clustering; Micro-Batch streaming; GNG; Topological structure; Visualization;
D O I
暂无
中图分类号
学科分类号
摘要
Organisations are increasingly relying on Big Data to provide the opportunities to discover correlations and patterns in data that would have previously remained hidden, and to subsequently use this new information to increase the quality of their business activities. In this paper we present a ‘story’ of Big Data from the initial data collection and to the end visualization, passing by the data fusion, and the analysis and clustering tasks. For this, we present a complete work flow on (a) how to represent the heterogeneous collected data using the high performance RDF language, how to perform the fusion of the Big Data in RDF by resolving the issue of entity disambiguity and how to query those data to provide more relevant and complete knowledge and (b) as the data are received in data streams, we propose batchStream, a Micro-Batching version of the growing neural gas approach, which is capable of clustering data streams with a single pass over the data. The batchStream algorithm allows us to discover clusters of arbitrary shapes without any assumptions on the number of clusters. This Big Data work flow is implemented in the Spark platform and we demonstrate it on synthetic and real data.
引用
收藏
页码:837 / 862
页数:25
相关论文
共 50 条
  • [41] Visualization as a mean of Big Data Management: Using Qatar's Electricity Consumption Data
    Soliman, Engy
    Fetais, Noora
    2017 9TH IEEE-GCC CONFERENCE AND EXHIBITION (GCCCE), 2018, : 400 - 405
  • [42] Hypergraph visualization and enrichment statistics: how the EGAN paradigm facilitates organic discovery from Big Data
    Paquette, Jesse
    Tokuyasu, Taku
    HUMAN VISION AND ELECTRONIC IMAGING XVI, 2011, 7865
  • [43] Trend Visualization of Academic Field: Proposed Method and Big Data Review
    Antonov E.V.
    Artamonov A.A.
    Rudik A.V.
    Malugin M.I.
    Scientific Visualization, 2022, 14 (02): : 62 - 76
  • [44] Big data visualization identifies the multidimensional molecular landscape of human gliomas
    Bolouri, Hamid
    Zhao, Lue Ping
    Holland, Eric C.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (19) : 5394 - 5399
  • [45] Study on Big Data Visualization of Joint Operation Command and Control System
    Liu, Gang
    Su, Yi
    BIG DATA - BIGDATA 2018, 2018, 10968 : 372 - 380
  • [46] Augmented Reality for Big Data Visualization: A Review
    Chandra, Ananth N. Ramaseri
    El Jamiy, Fatima
    Reza, Hassan
    2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019), 2019, : 1269 - 1274
  • [47] Big Data Visualization in Cardiology-A Systematic Review and Future Directions
    Nazir, Shah
    Khan, Muhammad Nawaz
    Anwar, Sajid
    Adnan, Awais
    Asadi, Shahla
    Shahzad, Sara
    Ali, Shaukat
    IEEE ACCESS, 2019, 7 : 115945 - 115958
  • [48] Visualization of Big Spatial Data Using Coresets for Kernel Density Estimates
    Zheng, Yan
    Ou, Yi
    Lex, Alexander
    Phillips, Jeff M.
    IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (03) : 524 - 534
  • [49] DEEPEYE: An Automatic Big Data Visualization Framework
    Xuedi Qin
    Yuyu Luo
    Nan Tang
    Guoliang Li
    Big Data Mining and Analytics, 2018, (01) : 75 - 82
  • [50] A System for Monitoring and Visualization of Big Mobility Data
    Meskovic, E.
    Osmanovic, D.
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1086 - 1091