Structure-aware contrastive hashing for unsupervised cross-modal retrieval

被引：14

作者：

Cui, Jinrong ^{[1
]}

He, Zhipeng ^{[1
]}

Huang, Qiong ^{[1
,3
]}

Fu, Yulu ^{[1
]}

Li, Yuting ^{[1
]}

Wen, Jie ^{[2
]}

机构：

[1] South China Agr Univ, Coll Math & Informat, Guangzhou 510642, Peoples R China

[2] Harbin Inst Technol, Shenzhen Key Lab Visual Object Detect & Recognit, Shenzhen 518055, Peoples R China

[3] Guangzhou Key Lab Intelligent Agricuture, Guangzhou, Peoples R China

来源：

NEURAL NETWORKS | 2024年 / 174卷

基金：

中国国家自然科学基金;

关键词：

Multimedia retrieval; Unsupervised deep hashing; Cross -modal retrieval; Binary code learning; NETWORK;

D O I：

10.1016/j.neunet.2024.106211

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross -modal hashing has attracted a lot of attention and achieved remarkable success in large-scale crossmedia similarity retrieval applications because of its superior computational efficiency and low storage overhead. However, constructing similarity relationship among samples in cross -modal unsupervised hashing is challenging because of the lack of manual annotation. Most existing unsupervised methods directly use the representations extracted from the backbone of their respective modality to construct instance similarity matrices, leading to inaccurate similarity matrices and resulting in suboptimal hash codes. To address this issue, a novel unsupervised hashing model, named Structure -aware Contrastive Hashing for Unsupervised Crossmodal Retrieval (SACH), is proposed in this paper. Specifically, we concurrently employ both high -dimensional representations and discriminative representations learned by the network to construct a more informative semantic correlative matrix across modalities. Moreover, we design a multimodal structure -aware alignment network to minimize heterogeneous gap in the high -order semantic space of each modality, effectively reducing disparities within heterogeneous data sources and enhancing the consistency of semantic information across modalities. Extensive experimental results on two widely utilized datasets demonstrate the superiority of our proposed SACH method in cross -modal retrieval tasks over existing state-of-the-art methods.

引用

页数：10

共 36 条

[1] Cognitive multi-modal consistent hashing with flexible semantic transformation [J].

An, Junfeng ;

Luo, Haoyang ;

Zhang, Zheng ;

Zhu, Lei ;

Lu, Guangming .

INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (01)

[2]

Ba J, 2014, ACS SYM SER

[3]

Chua T.-S., 2009, P ACM INT C IM VID R

[4]

Dejie Yang, 2020, ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval, P44, DOI 10.1145/3372278.3390673

[5] Fast Supervised Discrete Hashing [J].

Gui, Jie ;

Liu, Tongliang ;

Sun, Zhenan ;

Tao, Dacheng ;

Tan, Tieniu .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (02) :490-496

[6] Supervised Discrete Hashing With Relaxation [J].

Gui, Jie ;

Liu, Tongliang ;

Sun, Zhenan ;

Tao, Dacheng ;

Tan, Tieniu .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (03) :608-617

[7] Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing [J].

Hu, Hengtong ;

Xie, Lingxi ;

Hong, Richang ;

Tian, Qi .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3120-3129

[8] Incomplete multi-view clustering network via nonlinear manifold embedding and probability-induced loss [J].

Huang, Cheng ;

Cui, Jinrong ;

Fu, Yulu ;

Huang, Dong ;

Zhao, Min ;

Li, Lusi .

NEURAL NETWORKS, 2023, 163 :233-243

[9]

Huiskes M.J., 2008, P 1 ACM INT C MULT I, P39, DOI [10.1145/1460096.1460104, DOI 10.1145/1460096.1460104]

[10]

Jia WZ, 2023, AAAI CONF ARTIF INTE, P1007

← 1 2 3 4 →