EF-dedup: Enabling Collaborative Data Deduplication at the Network Edge

被引:13
作者
Li, Shijing [1 ]
Lan, Tian [1 ]
Balasubramanian, Bharath [2 ]
Ra, Moo-Ryong [2 ]
Lee, Hee Won [2 ]
Panta, Rajesh [2 ]
机构
[1] George Washington Univ, Washington, DC 20052 USA
[2] AT&T Res Lab, Washington, DC USA
来源
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019) | 2019年
关键词
Deduplication; Edge Computing; Edge Networks; Distributed Storage; Cloud Storage;
D O I
10.1109/ICDCS.2019.00102
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The advent of IoT and edge computing will lead to massive amounts of data that need to be collected and transmitted to online storage systems. To address this problem, we push data deduplication to the network edge. Specifically, we propose a new technique for collaborative edge-facilitated deduplication (EF-dedup), wherein we partition the resource-constrained edge nodes into disjoint clusters, maintain a deduplication index structure for each cluster using a distributed key-value store and perform decentralized deduplication within those clusters. This is a challenging partitioning problem that addresses a novel tradeoff: edge nodes with highly correlated data may not always be within the same edge cloud, with non-trivial network cost among them. We address this challenge by first formulating an optimization problem to partition the edge nodes, considering both the data similarities across the nodes and the inter-node network cost. We prove that the problem is NP-Hard, provide bounded heuristics to solve it and build a prototype EF-dedup system. Our experiments on EF-dedup, performed on edge nodes in AT&T research lab and a central cloud at AWS, demonstrate that EF-dedup achieves 38.3 similar to 118.5% better deduplication throughput than sole cloud-based techniques and achieves 43.4-60.2% lesser aggregate cost in terms of the network-storage tradeoff as compared to approaches that solely favor one over the other.
引用
收藏
页码:986 / 996
页数:11
相关论文
共 25 条
[1]  
[Anonymous], 2009, FAST
[2]  
[Anonymous], 2011, P 9 USENIX C FIL STO
[3]  
[Anonymous], 2018, CISCO GLOBAL CLOUD I
[4]  
[Anonymous], 2008, FAST
[5]  
[Anonymous], 2009, ACM INT C PROCEEDING
[6]  
[Anonymous], 2017, IDC DIRECTIONS 2017
[7]  
Balasubramanian B, 2014, IEEE INFOCOM SER, P592, DOI 10.1109/INFOCOM.2014.6847984
[8]  
Cong M., 2014, CRAWDAD dataset columbia/kinetic (v. 2014-05-13)
[9]   AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment [J].
Fu, Yinjin ;
Jian, Hong ;
Xiao, Nong ;
Tian, Lei ;
Liu, Fang .
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, :112-120
[10]  
Gionis A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P518