Deep Supervised Hashing Image Retrieval Method Based on Swin Transformer

被引:0
|
作者
Miao Z. [1 ]
Zhao X. [1 ]
Li Y. [1 ]
Wang J. [1 ]
Zhang R. [1 ]
机构
[1] Command and Control Engineering College, Army Engineering University of PLA, Nanjing
来源
Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences | 2023年 / 50卷 / 08期
基金
中国国家自然科学基金;
关键词
deep learning; hash learning; image retrieval; Swin Transformer;
D O I
10.16339/j.cnki.hdxbzkb.2023274
中图分类号
学科分类号
摘要
The feature extraction process in deep supervised Hash image retrieval has been dominated by the convolutional neural network architecture. However, with the application of Transformer in the field of vision, it becomes possible to replace the convolutional neural network architecture with Transformer. In order to address the limitations of existing Transformer-based hashing methods, such as the inability to generate hierarchical representations and high computational complexity, a deep supervised hash image retrieval method based on Swin Transformer is proposed. The proposed method utilizes the Swin Transformer network model, and incorporates a hash layer at the end of the network to generate hash encode for images. By introducing the concepts of locality and hierarchy into the model, the method effectively solve the above problems. Compared with 13 existing state-of-the-art methods, the method proposed in this paper has greatly improved the performance of hash retrieval. Experiments are carried out on two commonly used retrieval datasets, namely CIFAR-10 and NUS-WIDE. The experimental results show that the proposed method achieves the highest mean average precision(mAP)of 98.4% on the CIFAR-10 dataset. This represents an average increase of 7.1% compared with the TransHash method and an average increase of 0.57% compared with the VTS16-CSQ method. On the NUS-WIDE dataset, the proposed method achieves the highest mAP of 93.6%. This corresponds to an average improvement of 18.61% compared with the TransHash method, and an average increase of 8.6% in retrieval accuracy compared with the VTS16-CSQ method. © 2023 Hunan University. All rights reserved.
引用
收藏
页码:62 / 71
页数:9
相关论文
共 44 条
  • [1] FU C, XIANG C, WANG C X, Et al., Fast approximate nearest neighbor search with the navigating spreading-out graph[J], Proceedings of the VLDB Endowment, 12, 5, pp. 461-474, (2019)
  • [2] GE T Z, HE K M, KE Q F, Et al., Optimized product quantization for approximate nearest neighbor search [C], 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2946-2953, (2013)
  • [3] JEGOU H, DOUZE M, SCHMID C., Product quantization for nearest neighbor search[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1, pp. 117-128, (2011)
  • [4] MALKOV Y A, YASHUNIN D A., Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 4, pp. 824-836, (2020)
  • [5] LI Y, MIAO Z A, HE M, Et al., Deep attention residual hashing [J], IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E101, 3, pp. 654-657, (2018)
  • [6] CHARIKAR M S., Similarity estimation techniques from rounding algorithms, Proceedings of the Thirty-fourth Annual ACM Symposium on Theory of Computing, pp. 380-388, (2002)
  • [7] INDYK P, MOTWANI R, RAGHAVAN P, Et al., Locality-preserving hashing in multidimensional spaces, Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing, pp. 618-625, (1997)
  • [8] WEISS Y, TORRALBA A, FERGUS R., Spectral hashing, Proceedings of 22th Annual Conference on Neural Information Processing Systems, pp. 1753-1760, (2008)
  • [9] OLIVA A, TORRALBA A., Modeling the shape of the scene:a holistic representation of the spatial envelope[J], International Journal of Computer Vision, 42, 3, pp. 145-175, (2001)
  • [10] LIONG V E, LU J W, WANG G, Et al., Deep hashing for compact binary codes learning[C], 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2475-2483, (2015)