Fragmenting Big Data to boost the performance of MapReduce in geographical computing contexts

被引:3
作者
Cavallo, Marco [1 ]
Di Modica, Giuseppe [1 ]
Polito, Carmelo [1 ]
Tomarchio, Orazio [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
来源
2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA INNOVATIONS AND APPLICATIONS (INNOVATE-DATA) | 2017年
关键词
Big Data; MapReduce; Data fragmentation; Geographical computing environment; Hierarchical Hadoop;
D O I
10.1109/Innovate-Data.2017.12
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The last few years have seen a growing demand of distributed Cloud infrastructures able to process big data generated by geographically scattered sources. A key challenge of this environment is how to manage big data across multiple heterogeneous datacenters interconnected through imbalanced network links. We designed a Hierarchical Hadoop Framework (H2F) where a top-level business logic smartly schedules bottom-level computing tasks capable of exploiting the potential of the MapReduce within each datacenter. In this work we discuss on the opportunity of fragmenting the big data into small pieces so that better workload configurations may be devised for the bottom-level tasks. Several case study experiments were run on a testbed where a software prototype of the designed framework was deployed. The test results are reported and discussed in the last part of the paper.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 13 条
[1]  
Burke E. K., 2008, P C PRACT THEOR AUT
[2]   A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts [J].
Cavallo, Marco ;
Di Modica, Giuseppe ;
Polito, Carmelo ;
Tomarchio, Orazio .
IOTBDS: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, 2017, :92-101
[3]   H2F: a Hierarchical Hadoop Framework for big data processing in geo-distributed environments [J].
Cavallo, Marco ;
Di Modica, Giuseppe ;
Polito, Carmelo ;
Tomarchio, Orazio .
2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, :27-35
[4]  
Cavallo M, 2016, 2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), P555, DOI 10.1109/ISCC.2016.7543796
[5]  
Dean J, 2004, OSDI, P137
[6]  
Heintz B., 2014, IEEE T CLOUD COMPUT, V4, P293, DOI DOI 10.1109/TCC.2014.2355225
[7]   From the Cloud to the Atmosphere: Running MapReduce across Data Centers [J].
Jayalath, Chamikara ;
Stephen, Julian ;
Eugster, Patrick .
IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (01) :74-87
[8]  
Luo Yuan., 2011, Proc. of the second Int. workshop on Emerging computational methods for the life sciences, P15
[9]  
Majors James., 2010, IEEE International Symposium on Parallel Distributed Processing, P1
[10]   Scaling MapReduce Applications across Hybrid Clouds to Meet Soft Deadlines [J].
Mattess, Michael ;
Calheiros, Rodrigo N. ;
Buyya, Rajkumar .
2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2013, :629-636