CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing Retrieval

被引:10
|
作者
Li, Yewen [1 ]
Ge, Mingyuan [1 ]
Li, Mingyong [1 ]
Li, Tiansong [1 ]
Xiang, Sen [2 ]
机构
[1] Chongqing Normal Univ, Sch Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Wuhan Univ Sci & Technol, Sch Informat Sci & Engn, Wuhan 430081, Peoples R China
关键词
multi-modal retrieval; unsupervised learning; deep hashing; graph convolutional networks; attention mechanism;
D O I
10.3390/s23073439
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
With the proliferation of multi-modal data generated by various sensors, unsupervised multi-modal hashing retrieval has been extensively studied due to its advantages in storage, retrieval efficiency, and label independence. However, there are still two obstacles to existing unsupervised methods: (1) As existing methods cannot fully capture the complementary and co-occurrence information of multi-modal data, existing methods suffer from inaccurate similarity measures. (2) Existing methods suffer from unbalanced multi-modal learning and data semantic structure being corrupted in the process of hash codes binarization. To address these obstacles, we devise an effective CLIP-based Adaptive Graph Attention Network (CAGAN) for large-scale unsupervised multi-modal hashing retrieval. Firstly, we use the multi-modal model CLIP to extract fine-grained semantic features, mine similar information from different perspectives of multi-modal data and perform similarity fusion and enhancement. In addition, this paper proposes an adaptive graph attention network to assist the learning of hash codes, which uses an attention mechanism to learn adaptive graph similarity across modalities. It further aggregates the intrinsic neighborhood information of neighboring data nodes through a graph convolutional network to generate more discriminative hash codes. Finally, this paper employs an iterative approximate optimization strategy to mitigate the information loss in the binarization process. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms several representative hashing methods in unsupervised multi-modal retrieval tasks.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval
    Li Mingyong
    Li Yewen
    Ge Mingyuan
    Ma Longfei
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [2] CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval
    Mingyong, Li
    Yewen, Li
    Mingyuan, Ge
    Longfei, Ma
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [3] Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
    Liang Xie
    Lei Zhu
    Guoqi Chen
    Multimedia Tools and Applications, 2016, 75 : 9185 - 9204
  • [4] Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval
    Xie, Liang
    Zhu, Lei
    Chen, Guoqi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9185 - 9204
  • [5] Flexible Online Multi-modal Hashing for Large-scale Multimedia Retrieval
    Lu, Xu
    Zhu, Lei
    Cheng, Zhiyong
    Li, Jingjing
    Nie, Xiushan
    Zhang, Huaxiang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1129 - 1137
  • [6] CLIP Multi-modal Hashing for Multimedia Retrieval
    Zhu, Jian
    Sheng, Mingkai
    Huang, Zhangmin
    Chang, Jingfei
    Jiang, Jinling
    Long, Jian
    Luo, Cheng
    Liu, Lei
    MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 195 - 205
  • [7] Fast Discrete Collaborative Multi-Modal Hashing for Large-Scale Multimedia Retrieval
    Zheng, Chaoqun
    Zhu, Lei
    Lu, Xu
    Li, Jingjing
    Cheng, Zhiyong
    Zhang, Hanwang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (11) : 2171 - 2184
  • [8] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [9] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    Cognitive Computation, 2022, 14 : 1159 - 1171
  • [10] Deep Multi-Scale Attention Hashing Network for Large-Scale Image Retrieval
    Feng H.
    Wang N.
    Tang J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (04): : 35 - 45