Harmony: An Approach for Geo-distributed Processing of Big-Data Applications

被引:6
|
作者
Zhang, Han [1 ]
Ramapantulu, Lavanya [2 ]
Teo, Yong Meng [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore, Singapore
[2] Int Inst Informat Technol, Comp Sci Grp, Hyderabad, India
关键词
geo-distributed processing; data-centers; performance analysis; scheduling; MAPREDUCE;
D O I
10.1109/cluster.2019.8891053
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Big-data application processing is increasingly geodistributed, a paradigm shift from the traditional cluster-based processing frameworks. As the communication time for data movement across geo-distributed data centers is not a design criterion for traditional cluster-based processing approaches, there are research gaps in the algorithms used for staging and scheduling big-data applications for geo-distributed clusters. We address these gaps by proposing Harmony, an approach consisting of both staging and scheduling strategies to minimize an application's total execution time. The staging strategy of Harmony exploits the intra-stage parallelism by having concurrent operators within a stage in contrast to the traditional Apache spark which uses fine-grained stages, thus reducing the computation time within each stage. Secondly, the scheduling strategy of Harmony reduces data transfers between geo-distributed data centers by exploiting data locality and thus reducing communication time and total execution time. The proposed approach Harmony achieves a speedup of two times with respect to geo-distributed Apache Spark. In addition, Harmony achieves a speedup of 1.6 times and 2.1 times when compared with the state-of-the-art framework Iridium for geo-distributed analytics over five locations with uniform and non-uniform network link bandwidths respectively.
引用
收藏
页码:160 / 170
页数:11
相关论文
共 50 条
  • [21] A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    IOTBDS: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, 2017, : 92 - 101
  • [22] On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters
    Zhou, Amelie Chi
    Ibrahim, Shadi
    He, Bingsheng
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 1397 - 1407
  • [23] Time Optimization Modeling for Big Data Placement and Analysis for Geo-Distributed Data Centers
    Khan, Awais
    Attique, Muhammad
    Chung, Tae-Sun
    Kim, Youngjae
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 140 - 141
  • [24] Multi-job Hadoop scheduling to process Geo-distributed big data
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    2017 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2017, : 1175 - 1181
  • [25] Optimizing Geo-Distributed Data Processing with Resource Heterogeneity over the Internet
    Marzuni, Saeed mirpour
    Toosi, Adel
    Savadi, Abdorreza
    Naghibzadeh, Mahmud
    Taniar, David
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2025, 25 (01)
  • [26] A Hierarchical Hadoop Framework to Handle Big Data in Geo-Distributed Computing Environments
    Tomarchio, Orazio
    Di Modica, Giuseppe
    Cavallo, Marco
    Polito, Carmelo
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2018, 11 (01) : 16 - 47
  • [27] A Survey on Geographically Distributed Big-Data Processing Using MapReduce
    Dolev, Shlomi
    Florissi, Patricia
    Gudes, Ehud
    Sharma, Shantanu
    Singer, Ido
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (01) : 60 - 80
  • [28] Hierarchical Approach for Efficient Workload Management in Geo-Distributed Data Centers
    Forestiero, Agostino
    Mastroianni, Carlo
    Meo, Michela
    Papuzzo, Giuseppe
    Sheikhalishahi, Mehdi
    IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2017, 1 (01): : 97 - 111
  • [29] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 421 - 434
  • [30] Accelerating Geo-Distributed Transaction Processing with Fast Logging
    Ogura, Takuto
    Akita, Yoshiki
    Miyazawa, Yuki
    Kawashima, Hideyuki
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 2390 - 2399