Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing

被引:0
|
作者
Aly, Mohab [1 ]
Yacout, Soumaya [1 ]
Shaban, Yasser [2 ]
机构
[1] Ecole Polytech Montreal, Dept Ind Engn, CP 6079,Succ Ctr Ville, Montreal, PQ H3C 3A7, Canada
[2] Helwan Univ, Dept Mech Design Engn, POB 11718, Cairo, Egypt
来源
2017 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM | 2017年
关键词
Cloud Computing; Big Data; MapReduce; Parallel Processing; Data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the emergence of the 'Big Data' paradigm, more and more industrial data are now available for practitioners and professionals. This data is being generated faster due to the advancement of the new information technologies. For reliability and maintenance engineers, 'Big Data' is an interesting source of information. If analyzed correctly, it can produce useful knowledge-base to help making decisions in an industrial organization. The availability of 'Big Data' is now leading to a new area of researches that are dedicated to the analysis of such data. This paper shows how to analyze massive amount of data generated from an industrial system(s). Those massive data may range from terabytes to petabytes in size; analyzing such sizes cannot be performed on a single commodity computer due to the possibility of memory leakage as the data may not fit into the computer's resources, specifically CPUs. Even if it fits, it will take an unacceptable amount of time. For this purpose, processing industrial large size of data requires the involvement of high performance analytical systems running on distributed environments. Different algorithms can be considered to have such analysis done. Cloud Computing models provide the necessary scalable and flexible infrastructure(s) to adapt the standard analytics algorithms in a distributed manner. We introduce a new distributed training technique that combines the newly widely used framework for big dataflow, namely MapReduce, with the traditional structure of machine learning techniques such as matrix multiplication and linear regression. Parallel processing of the aforementioned types is based on different algorithms to be adapted to MapReduce and its framework. Our considered platform is deployed on top of Google Cloud Platform (App Engine and Compute Engine), also taking into consideration Cloud Amazon EMR services to see how we can benefit from the provisioned resources in each one of them, and make the analysis and the extraction of useful information from the massive industrial data goes faster, i.e. in its computational time.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Efficient and Parallel Data Processing and Resource Allocation in the Cloud by using Nephele's Data Processing Framework
    Saranya, V.
    Ramya, S.
    Kumar, R. G. Suresh
    Nalini, T.
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (03): : 33 - 40
  • [32] Performance of Scalable Off-The-Shelf Hardware for Data-intensive Parallel Processing using MapReduce
    Fadzil, Ahmad Firdaus Ahmad
    Khalid, Noor Elaiza Abdul
    Manaf, Mazani
    2012 7TH INTERNATIONAL CONFERENCE ON COMPUTING AND CONVERGENCE TECHNOLOGY (ICCCT2012), 2012, : 379 - 384
  • [33] Trust-Based Scheduling Framework for Big Data Processing with MapReduce
    Thanh Dat Dang
    Doan Hoang
    Nguyen, Diep N.
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (01) : 279 - 293
  • [34] A Framework for Fast MapReduce Processing Considering Sensitive Data on Hybrid Clouds
    Kawamoto, Shun
    Kamidoi, Yoko
    Wakabayashi, Shin'ichi
    2020 IEEE 44TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2020), 2020, : 1357 - 1362
  • [35] Processing Geo-Dispersed Big Data in an Advanced MapReduce Framework
    Zhang, Hongli
    Zhang, Qiang
    Zhou, Zhigang
    Du, Xiaojiang
    Yu, Wei
    Guizani, Mohsen
    IEEE NETWORK, 2015, 29 (05): : 24 - 30
  • [36] Scientific data processing using MapReduce in cloud environments
    Kong, Xiangsheng
    Information Technology Journal, 2013, 12 (23) : 7869 - 7873
  • [37] Meteorological Data Analysis Using MapReduce
    Fang, Wei
    Sheng, V. S.
    Wen, XueZhi
    Pan, Wubin
    SCIENTIFIC WORLD JOURNAL, 2014,
  • [38] An Improved Parallel Association Rules Algorithm Based on MapReduce Framework for Big Data
    Zhou, Xinhao
    Huang, Yongfeng
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 284 - 288
  • [39] A Parallel Fractional Lion Algorithm for Data Clustering Based on MapReduce Cluster Framework
    Chander, Satish
    Vijaya, P.
    Dhyani, Praveen
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)
  • [40] PARALLEL KNOWLEDGE ACQUISITION ALGORITHM FOR BIG DATA USING MAPREDUCE
    Qian, Jin
    Xia, Min
    Lv, Ping
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 316 - 321