Deep Multi-Modal Hashing With Semantic Enhancement for Multi-Label Micro-Video Retrieval

被引:1
|
作者
Jing, Peiguang [1 ]
Sun, Haoyi [2 ]
Nie, Liqiang [3 ]
Li, Yun [4 ,5 ]
Su, Yuting [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Tianjin Univ, Sch Future Technol, Tianjin 300072, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
[4] Guangxi Univ Finance & Econ, Sch Big Data & Artificial Intelligence, Guangxi 530001, Peoples R China
[5] Guangxi Key Lab Big Data Finance & Econ, Nanning 530001, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Hash functions; Encoding; Representation learning; Convolutional neural networks; Quantization (signal); Kernel; Deep hashing; micro-video retrieval; multi-label; multi-modality; MAXIMUM-LIKELIHOOD; QUANTIZATION;
D O I
10.1109/TKDE.2023.3337077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pressing need for low storage and high efficiency has significantly propelled the advancement of deep hashing techniques in the realm of large-scale search and retrieval tasks. As one of the most prevailing forms of user-generated contents, micro-videos usually represent more complicated multi-modal behaviors that are further challenged in multi-label retrieval. Existing multi-modal hashing methods tend to prioritize the complementarity and consistency in multi-modal fusion, while neglecting the completeness problem. In this paper, we propose a deep multi-modal hashing with semantic enhancement (DMHSE) method that effectively integrates complete multi-modal representation learning with discriminative binary coding by means of collaboration between two distinct encoders, FoldCoder and HashCoder. FoldCoder translates latent multi-modal representation learning to a degradation process through mimicking data transmitting. Further, it incorporates a prompt learning paradigm to maximize the utilization of multi-label semantics for guiding representation learning. HashCoder combines pairwise and central constraints to ensure more discriminative hashing results. Pairwise constraint preserves the original local relevance structure, while central constraint tackles the problem of semantic ambiguity in multi-label data by leveraging the global label distribution. Experimental results demonstrate that DMHSE achieves superior performance in multi-label micro-video retrieval tasks.
引用
收藏
页码:5080 / 5091
页数:12
相关论文
共 50 条
  • [21] Multi-modal information augmented model for micro-video recommendation
    Huo Y.
    Jin B.
    Liao Z.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (06): : 1142 - 1152
  • [22] Research on Micro-video Multi-Label Classification Based on Deep Multimodal Association Learning
    Li, Yun
    Lu, Zhixiang
    Liu, Shuyi
    Wang, Su
    Lü, Zimin
    Jing, Peiguang
    Data Analysis and Knowledge Discovery, 2024, 8 (07) : 77 - 88
  • [23] Flexible Dual Multi-Modal Hashing for Incomplete Multi-Modal Retrieval
    Wei, Yuhong
    An, Junfeng
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
  • [24] A multi-modal system for the retrieval of semantic video events
    Amir, A
    Basu, S
    Iyengar, G
    Lin, CY
    Naphade, M
    Smith, JR
    Srinivasan, S
    Tseng, B
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
  • [25] Deep Hashing With Walsh Domain for Multi-Label Image Retrieval
    Chen, Yinqi
    Li, Peiwen
    Zheng, Yangting
    Luo, Weijian
    Gao, Xiang
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 861 - 865
  • [26] Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval
    Shen, Xiaobo
    Chen, Yinfan
    Liu, Weiwei
    Zheng, Yuhui
    Sun, Quan-Sen
    Pan, Shirui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [27] DEEP HASHING MULTI-LABEL IMAGE RETRIEVAL WITH ATTENTION MECHANISM
    Xie, Wu
    Cui, Mengyin
    Liu, Manyi
    Wang, Peilei
    Qiang, Baohua
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2022, 37 (04): : 372 - 381
  • [28] Multimodal Attentive Representation Learning for Micro-video Multi-label Classification
    Jing, Peiguang
    Liu, Xianyi
    Zhang, Lijuan
    Li, Yun
    Liu, Yu
    Su, Yuting
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
  • [29] Deep Ranking Distribution Preserving Hashing for Robust Multi-Label Cross-Modal Retrieval
    Song, Ge
    Huang, Kai
    Su, Hanwen
    Song, Fengyi
    Yang, Ming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7027 - 7042
  • [30] Multi-modal Multi-label Semantic Indexing of Images using Unlabeled Data
    Li, Wei
    Sun, Maosong
    ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 204 - 209