High-speed data deduplication using parallelized cuckoo hashing

被引:1
|
作者
Jeyaraj, Jane Rubel Angelina [1 ]
Kambaraj, Sundarakantham [1 ]
Dharmarajan, Velmurugan [1 ]
机构
[1] Thiagarajar Coll Engn, Dept Comp Sci & Engn, Madurai, Tamil Nadu, India
关键词
Deduplication; parallelized cuckoo; backup;
D O I
10.3906/elk-1708-336
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data deduplication is a capacity optimization technology used in backup systems for identifying and storing the nonredundant data blocks. The CPU intensive tasks involved in a hash-based deduplication system remain as challenges in improving the performance of the system. In this paper, we propose a parallel variant of the standard cuckoo hashing that enables the hashing technique to be performed in parallel. The CPU intensive tasks of fingerprint insertion and lookup operations are performed in parallel and distributed among the nodes of the deduplication cluster. Furthermore, the uniform handling of the blocks by the cluster nodes involved in the process of duplicate identification provides good load balance. Experimental evaluations using real-world backup and Linux kernel data sets reveal that the proposed deduplication system achieves up to 100% higher backup speed, up to 28% reduced lookup latency, and up to 24% reduced backup time than the other deduplication systems.
引用
收藏
页码:1417 / 1429
页数:13
相关论文
共 50 条