Deep Multi-Modal Hashing With Semantic Enhancement for Multi-Label Micro-Video Retrieval

被引：1

作者：

Jing, Peiguang ^{[1
]}

Sun, Haoyi ^{[2
]}

Nie, Liqiang ^{[3
]}

Li, Yun ^{[4
,5
]}

Su, Yuting ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Tianjin Univ, Sch Future Technol, Tianjin 300072, Peoples R China

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[4] Guangxi Univ Finance & Econ, Sch Big Data & Artificial Intelligence, Guangxi 530001, Peoples R China

[5] Guangxi Key Lab Big Data Finance & Econ, Nanning 530001, Guangxi, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Semantics; Hash functions; Encoding; Representation learning; Convolutional neural networks; Quantization (signal); Kernel; Deep hashing; micro-video retrieval; multi-label; multi-modality; MAXIMUM-LIKELIHOOD; QUANTIZATION;

D O I：

10.1109/TKDE.2023.3337077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The pressing need for low storage and high efficiency has significantly propelled the advancement of deep hashing techniques in the realm of large-scale search and retrieval tasks. As one of the most prevailing forms of user-generated contents, micro-videos usually represent more complicated multi-modal behaviors that are further challenged in multi-label retrieval. Existing multi-modal hashing methods tend to prioritize the complementarity and consistency in multi-modal fusion, while neglecting the completeness problem. In this paper, we propose a deep multi-modal hashing with semantic enhancement (DMHSE) method that effectively integrates complete multi-modal representation learning with discriminative binary coding by means of collaboration between two distinct encoders, FoldCoder and HashCoder. FoldCoder translates latent multi-modal representation learning to a degradation process through mimicking data transmitting. Further, it incorporates a prompt learning paradigm to maximize the utilization of multi-label semantics for guiding representation learning. HashCoder combines pairwise and central constraints to ensure more discriminative hashing results. Pairwise constraint preserves the original local relevance structure, while central constraint tackles the problem of semantic ambiguity in multi-label data by leveraging the global label distribution. Experimental results demonstrate that DMHSE achieves superior performance in multi-label micro-video retrieval tasks.

引用

页码：5080 / 5091

页数：12

共 50 条

[21] Multi-modal information augmented model for micro-video recommendation
Huo Y.
Jin B.
Liao Z.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (06): : 1142 - 1152
[22] Research on Micro-video Multi-Label Classification Based on Deep Multimodal Association Learning
Li, Yun
Lu, Zhixiang
Liu, Shuyi
Wang, Su
Lü, Zimin
Jing, Peiguang
Data Analysis and Knowledge Discovery, 2024, 8 (07) : 77 - 88
[23] Flexible Dual Multi-Modal Hashing for Incomplete Multi-Modal Retrieval
Wei, Yuhong
An, Junfeng
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
[24] A multi-modal system for the retrieval of semantic video events
Amir, A
Basu, S
Iyengar, G
Lin, CY
Naphade, M
Smith, JR
Srinivasan, S
Tseng, B
COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
[25] Deep Hashing With Walsh Domain for Multi-Label Image Retrieval
Chen, Yinqi
Li, Peiwen
Zheng, Yangting
Luo, Weijian
Gao, Xiang
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 861 - 865
[26] Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval
Shen, Xiaobo
Chen, Yinfan
Liu, Weiwei
Zheng, Yuhui
Sun, Quan-Sen
Pan, Shirui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[27] DEEP HASHING MULTI-LABEL IMAGE RETRIEVAL WITH ATTENTION MECHANISM
Xie, Wu
Cui, Mengyin
Liu, Manyi
Wang, Peilei
Qiang, Baohua
INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2022, 37 (04): : 372 - 381
[28] Multimodal Attentive Representation Learning for Micro-video Multi-label Classification
Jing, Peiguang
Liu, Xianyi
Zhang, Lijuan
Li, Yun
Liu, Yu
Su, Yuting
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (06)
[29] Deep Ranking Distribution Preserving Hashing for Robust Multi-Label Cross-Modal Retrieval
Song, Ge
Huang, Kai
Su, Hanwen
Song, Fengyi
Yang, Ming
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7027 - 7042
[30] Multi-modal Multi-label Semantic Indexing of Images using Unlabeled Data
Li, Wei
Sun, Maosong
ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 204 - 209

← 1 2 3 4 5 →