Tology-Aware Optimal Data Placement Algorithm for Network Traffic Optimization

被引:23
作者
Chen, Wuhui [1 ]
Paik, Incheon [1 ]
Li, Zhenni [1 ]
机构
[1] Univ Aizu, Sch Comp Sci & Engn, Fukushima, Japan
关键词
Geo-distributed data centers; MapReduce; optimal data allocation; topology-aware; heuristic algorithm;
D O I
10.1109/TC.2015.2485230
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new optimal data placement technique to improve the performance of MapReduce in cloud data centers by considering not only the data locality but also the global data access costs. We first conducted an analytical and experimental study to identify the performance issues of MapReduce in data centers and to show that MapReduce tasks that are involved in unexpected remote data access have much greater communication costs and execution time, and can significantly deteriorate the overall performance. Next, we formulated the problem of optimal data placement and proposed a generative model to minimize global data access cost in data centers and showed that the optimal data placement problem is NP-hard. To solve the optimal data placement problem, we propose a topology-aware heuristic algorithm by first constructing a replica-balanced distribution tree for the abstract tree structure, and then building a replica-similarity distribution tree for detail tree construction, to construct an optimal replica distribution tree. The experimental results demonstrated that our optimal data placement approach can improve the performance of MapReduce with lower communication and computation costs by effectively minimizing global data access costs, more specifically reducing unexpected remote data access.
引用
收藏
页码:2603 / 2617
页数:15
相关论文
共 24 条
  • [1] [Anonymous], 2009, P ACM SIGCOMM
  • [2] [Anonymous], 2010, PROC 9 USENIX C OPER
  • [3] [Anonymous], 2009, INT GEOGR DISP DAT C
  • [4] [Anonymous], 2010, NSDI 10 P 7 US C NET
  • [5] Cheng G., 2008, Transportation Research Board 87th Annual Meeting Compendium of Papers, P1
  • [6] Farenzena M., 2008, 2008 IEEE C COMPUTER, P1, DOI [10.1109/CVPR.2008.4587776, DOI 10.1109/CVPR.2008.4587776]
  • [7] VL2: A Scalable and Flexible Data Center Network
    Greenberg, Albert
    Hamilton, James R.
    Jain, Navendu
    Kandula, Srikanth
    Kim, Changhoon
    Lahiri, Parantap
    Maltz, David A.
    Patel, Parveen
    Sengupta, Sudipta
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2009, 39 (04) : 51 - 62
  • [8] Expandable and Cost-Effective Network Structures for Data Centers Using Dual-Port Servers
    Guo, Deke
    Chen, Tao
    Li, Dan
    Li, Mo
    Liu, Yunhao
    Chen, Guihai
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (07) : 1303 - 1317
  • [9] Guo DK, 2011, IEEE INFOCOM SER, P61, DOI 10.1109/INFCOM.2011.5935239
  • [10] Isard M., 2009, P 22 ACM SIGOPS S OP, P2261