A fast and efficient large-scale near duplicate image retrieval system using double perceptual hashing

被引：1

作者：

Subudhi, Priyambada ^{[1
]}

Kumari, Kirti ^{[2
]}

机构：

[1] Indian Inst Informat Technol, Dept Comp Sci & Engn, Sri City, India

[2] Indian Inst Informat Technol, Dept Comp Sci & Engn, Ranchi, India

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2024年 / 18卷 / 12期

关键词：

Near duplicate image retrieval; Perceptual hashing; Hash code partitioning; Hamming distance;

D O I：

10.1007/s11760-024-03490-w

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the ever-increasing volume of digital images available online, it has become important to identify similar images quickly and accurately across a variety of domains. Perceptual hashing is known to be the most widely used method for such near-duplicate image retrieval. While content-based features provide superior accuracy in detecting similar images, using hash codes derived from these features reduces storage requirements and improves time efficiency. However, as the image volume increases, the computational complexity of perceptual hashing poses a challenge. Another significant challenge is the robustness of perceptual hash functions against adversarial manipulations. To deal with these issues and to improve the accuracy of near duplicate image retrieval, this paper proposes a double perceptual hashing approach. Here, the primary hash performs a coarse matching and retrieves all the relevant images to the query image. Subsequently, a secondary hash performs fine matching by eliminating false positive images identified by the primary hash. While dual hash functions enhance robustness, another novel strategy of partitioning the primary hash into equal-sized segments boosts storage efficiency and accelerates the search speed by over tenfold compared to the naive approach. Experimental results using Copydays dataset augmented with 30,000 random images show average mAP and response time of 0.89 and 0.101sec respectively verifying its efficiency on large datasets.

引用

页码：8565 / 8575

页数：11