Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

被引:18
作者
Sun, Xudong [1 ]
He, Yulin [1 ,2 ]
Wu, Dingming [1 ]
Huang, Joshua Zhexue [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518107, Peoples R China
基金
中国国家自然科学基金;
关键词
Analytical models; Costs; Computational modeling; Clustering algorithms; Distributed databases; Big Data; Programming; distributed computing frameworks; big data analysis; approximate computing; MapReduce computing model; MAP-REDUCE; MAPREDUCE; PERFORMANCE; MANAGEMENT; HADOOP; TAXONOMY; SYSTEMS;
D O I
10.26599/BDMA.2022.9020014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
引用
收藏
页码:154 / 169
页数:16
相关论文
共 113 条
[1]  
Alapati S.R., 2016, Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN
[2]   H2Hadoop: Improving Hadoop Performance Using the Metadata of Related Jobs [J].
Alshammari, Hamoud ;
Lee, Jeongkyu ;
Bajwa, Hassan .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (04) :1031-1040
[3]  
Anderson D. P., 2005, Proceedings. First International Conference on e-Science and Grid Computing
[4]  
Anil R, 2020, J MACH LEARN RES, V21
[5]  
Anjomshoa Mohammadfazel, 2015, Journal of Applied Sciences, V15, P50, DOI 10.3923/jas.2015.46.57
[6]  
[Anonymous], 2015, PROC 10 DOCTORAL S I
[7]  
[Anonymous], 2015, International Journal of Computer Trends and Technology, Volume, DOI [DOI 10.14445/22312803/IJCTT-V19P103, 10.14445/22312803/IJCTT-V19P103]
[8]  
[Anonymous], 2010, Hadoop in action
[9]  
Aravinth S.S., 2015, IJIRST -International Journal for Innovative Research in Science Technology, V1, P252
[10]   Spark SQL: Relational Data Processing in Spark [J].
Armbrust, Michael ;
Xin, Reynold S. ;
Lian, Cheng ;
Huai, Yin ;
Liu, Davies ;
Bradley, Joseph K. ;
Meng, Xiangrui ;
Kaftan, Tomer ;
Franklint, Michael J. ;
Ghodsi, Ali ;
Zaharia, Matei .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1383-1394