Distributed Gaussian Mixture Model Summarization Using the MapReduce Framework

被引:0
|
作者
Esmaeilpour, Arina [1 ,2 ]
Bigdeli, Elnaz [2 ,3 ]
Cheraghchi, Fatemeh [2 ,3 ]
Raahemi, Bijan [2 ]
Far, Behrouz H. [1 ]
机构
[1] Univ Calgary, Dept Elect & Comp Engn, 2500 Univ Dr NW, Calgary, AB, Canada
[2] Univ Ottawa, Knowledge Discovery & Data Min Lab, Telfer Sch Management, 55 Laurier Ave E, Ottawa, ON, Canada
[3] Univ Ottawa, Dept Comp Sci, 600 King Edward, Ottawa, ON, Canada
来源
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2016 | 2016年 / 9673卷
关键词
Distributed density-based clustering; Distributed cluster summarization; Gaussian mixture model; MapReduce; CLUSTERING-ALGORITHM; MR-DBSCAN;
D O I
10.1007/978-3-319-34111-8_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With an accelerating rate of data generation, sophisticated techniques are essential to meet scalability requirements. One of the promising avenues for handling large datasets is distributed storage and processing. Further, data summarization is a useful concept for managing large datasets, wherein a subset of the data can be used to provide an approximate yet useful representation. Consolidation of these tools can allow a distributed implementation of data summarization. In this paper, we achieve this by proposing and implementing a distributed Gaussian Mixture Model Summarization using the MapReduce framework (MR-SGMM). In MR-SGMM, we partition input data, cluster the data within each partition with a density-based clustering algorithm called DBSCAN, and for all clusters we discover SGMM core points and their features. We test the implementation with synthetic and real datasets to demonstrate its validity and efficiency. This paves the way for a scalable implementation of Summarization using Gaussian Mixture Model (SGMM).
引用
收藏
页码:323 / 335
页数:13
相关论文
共 50 条
  • [1] Distributed MapReduce Framework using Distributed Hash Table
    Chiu, Chuan-Feng
    Hsu, Steen J.
    Jan, Sen-Ren
    2013 INTERNATIONAL JOINT CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY & UBI-MEDIA COMPUTING (ICAST-UMEDIA), 2013, : 475 - 480
  • [2] A Distributed Framework for Event Log Analysis using MapReduce
    Dewangan, Sandeep Kumar
    Pandey, Shikha
    Verma, Toran
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2016, : 503 - 506
  • [3] Summarization using Mapreduce Framework based Big Data and Hybrid Algorithm (HMM and DBSCAN)
    Belerao, Krushnadeo Tanaji
    Chaudhari, S. B.
    2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 377 - 380
  • [4] LOW COMPLEXITY ON-LINE VIDEO SUMMARIZATION WITH GAUSSIAN MIXTURE MODEL BASED CLUSTERING
    Ou, Shun-Hsing
    Lee, Chia-Han
    Somayazulu, V. Srinivasa
    Chen, Yen-Kuang
    Chien, Shao-Yi
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Chemical gas leakage source determination using distributed EM algorithm with Gaussian mixture model
    Yong, Z.
    Liyi, Z.
    Li, W.
    Jianfeng, H.
    Zhe, B.
    BULGARIAN CHEMICAL COMMUNICATIONS, 2016, 48 : 108 - 116
  • [6] Superpixel Segmentation Using Gaussian Mixture Model
    Ban, Zhihua
    Liu, Jianguo
    Cao, Li
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (08) : 4105 - 4117
  • [7] Using the Gini Index for a Gaussian Mixture Model
    Laura Lopez-Lobato, Adriana
    Lorena Avendano-Garrido, Martha
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 403 - 418
  • [8] Distributed CTL model checking using MapReduce: theory and practice
    Camilli, Carlo Bellettini Matteo
    Capra, Lorenzo
    Monga, Mattia
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (11): : 3025 - 3041
  • [9] Transfer-Learning-Based Gaussian Mixture Model for Distributed Clustering
    Wang, Rongrong
    Han, Shiyuan
    Zhou, Jin
    Chen, Yuehui
    Wang, Lin
    Du, Tao
    Ji, Ke
    Zhao, Ya-ou
    Zhang, Kun
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 7058 - 7070
  • [10] Distributed personalized imputation based on Gaussian mixture model for missing data
    Chen S.
    Liu Y.
    Neural Computing and Applications, 2024, 36 (23) : 14237 - 14250