LOFS: A Lightweight Online File Storage Strategy for Effective Data Deduplication at Network Edge

被引:14
作者
Cheng, Geyao [1 ]
Guo, Deke [1 ]
Luo, Lailong [1 ,2 ]
Xia, Junxu [1 ]
Gu, Siyuan [1 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Informat Syst Engn Lab, Changsha 410073, Hunan, Peoples R China
[2] Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha 410073, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Servers; Indexes; Costs; Redundancy; Throughput; Image edge detection; Resource management; Data deduplication; locality sensitivity hash; edge computing; storage systems;
D O I
10.1109/TPDS.2021.3133098
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Edge computing responds to users' requests with low latency by storing the relevant files at the network edge. Various data deduplication technologies are currently employed at edge to eliminate redundant data chunks for space saving. However, the lookup for the global huge-volume fingerprint indexes imposed by detecting redundancies can significantly degrade the data processing performance. Besides, we envision a novel file storage strategy that realizes the following rationales simultaneously: 1) space efficiency, 2) access efficiency, and 3) load balance, while the existing methods fail to achieve them at one shot. To this end, we report LOFS, a Lightweight Online File Storage strategy, which aims at eliminating redundancies through maximizing the probability of successful data deduplication, while realizing the three design rationales simultaneously. LOFS leverages a lightweight three-layer hash mapping scheme to solve this problem with constant-time complexity. To be specific, LOFS employs the Bloom filter to generate a sketch for each file, and thereafter feeds the sketches to the Locality Sensitivity hash (LSH) such that similar files are likely to be projected nearby in LSH tablespace. At last, LOFS assigns the files to real-world edge servers with the joint consideration of the LSH load distribution and the edge server capacity. Trace-driven experiments show that LOFS closely tracks the global deduplication ratio and generates a relatively low load std compared with the comparison methods.
引用
收藏
页码:2263 / 2276
页数:14
相关论文
共 47 条
[1]  
[Anonymous], 1998, The rsync algorithm
[2]  
[Anonymous], 2001, Tech. Rep.
[3]  
[Anonymous], 2016, CODES LOCALSTACK TOP
[4]  
[Anonymous], 2010, CISC VIS NETW IND GL
[5]  
[Anonymous], 2013, CITY HASH
[6]   A Decentralized Replica Placement Algorithm for Edge Computing [J].
Aral, Atakan ;
Ovatman, Tolga .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2018, 15 (02) :516-529
[7]  
Balasubramanian B, 2014, IEEE INFOCOM SER, P592, DOI 10.1109/INFOCOM.2014.6847984
[8]  
Bhagwat D, 2009, 2009 IEEE INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS & SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS), P237
[9]  
Cao ZC, 2019, PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, P129
[10]  
Cao ZC, 2018, PROCEEDINGS OF THE 16TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, P309