Deep Supervised Hashing Image Retrieval Method Based on Swin Transformer

被引：0

作者：

Miao Z. ^{[1
]}

Zhao X. ^{[1
]}

Li Y. ^{[1
]}

Wang J. ^{[1
]}

Zhang R. ^{[1
]}

机构：

[1] Command and Control Engineering College, Army Engineering University of PLA, Nanjing

来源：

Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences | 2023年 / 50卷 / 08期

基金：

中国国家自然科学基金;

关键词：

deep learning; hash learning; image retrieval; Swin Transformer;

D O I：

10.16339/j.cnki.hdxbzkb.2023274

中图分类号：

学科分类号：

摘要：

The feature extraction process in deep supervised Hash image retrieval has been dominated by the convolutional neural network architecture. However, with the application of Transformer in the field of vision, it becomes possible to replace the convolutional neural network architecture with Transformer. In order to address the limitations of existing Transformer-based hashing methods, such as the inability to generate hierarchical representations and high computational complexity, a deep supervised hash image retrieval method based on Swin Transformer is proposed. The proposed method utilizes the Swin Transformer network model, and incorporates a hash layer at the end of the network to generate hash encode for images. By introducing the concepts of locality and hierarchy into the model, the method effectively solve the above problems. Compared with 13 existing state-of-the-art methods, the method proposed in this paper has greatly improved the performance of hash retrieval. Experiments are carried out on two commonly used retrieval datasets, namely CIFAR-10 and NUS-WIDE. The experimental results show that the proposed method achieves the highest mean average precision(mAP)of 98.4% on the CIFAR-10 dataset. This represents an average increase of 7.1% compared with the TransHash method and an average increase of 0.57% compared with the VTS16-CSQ method. On the NUS-WIDE dataset, the proposed method achieves the highest mAP of 93.6%. This corresponds to an average improvement of 18.61% compared with the TransHash method, and an average increase of 8.6% in retrieval accuracy compared with the VTS16-CSQ method. © 2023 Hunan University. All rights reserved.

引用

页码：62 / 71

页数：9

共 44 条

[21]

HE K M, ZHANG X Y, REN S Q, Et al., Deep residual learning for image recognition[C], 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 770-778, (2016)

[22]

TAN M X, LE Q V., EfficientNet：rethinking model scaling for convolutional neural networks [EB/OL], (2019)

[23]

ZHU H, LONG M S, WANG J M, Et al., Deep hashing network for efficient similarity retrieval[C], Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2415-2421, (2016)

[24]

ZHANG Z, ZOU Q, LIN Y W, Et al., Improved deep hashing with soft pairwise similarity for multi-label image retrieval[J], IEEE Transactions on Multimedia, 22, 2, pp. 540-553, (2020)

[25]

LI Y, MIAO Z, WANG J B, Et al., Deep discriminative supervised hashing via Siamese network [J], IEICE Transactions on Information and Systems, E100, 12, pp. 3036-3040, (2017)

[26]

CAO Y, LIU B, LONG M S, Et al., HashGAN：deep learning to hash with pair conditional Wasserstein GAN[C], 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1287-1296, (2018)

[27]

LI Q, SUN Z N, HE R, Et al., Deep supervised discrete hashing [C], Advances in Neural Information Processing Systems 30, pp. 2482-2491, (2017)

[28]

CAO Z J, LONG M S, WANG J M, Et al., HashNet：deep learning to hash by continuation[C], 2017 IEEE International Conference on Computer Vision(ICCV), pp. 5609-5618, (2017)

[29]

SU S P, ZHANG C, HAN K, Et al., Greedy hash：towards fast optimization for accurate hash coding in CNN[C], Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 806-815, (2018)

[30]

VASWANI A, SHAZEER N, PARMAR N, Et al., Attention is all You need, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, (2017)

← 1 2 3 4 5 →