Deep Multi-Modal Hashing With Semantic Enhancement for Multi-Label Micro-Video Retrieval

被引：1

作者：

Jing, Peiguang ^{[1
]}

Sun, Haoyi ^{[2
]}

Nie, Liqiang ^{[3
]}

Li, Yun ^{[4
,5
]}

Su, Yuting ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Tianjin Univ, Sch Future Technol, Tianjin 300072, Peoples R China

[3] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[4] Guangxi Univ Finance & Econ, Sch Big Data & Artificial Intelligence, Guangxi 530001, Peoples R China

[5] Guangxi Key Lab Big Data Finance & Econ, Nanning 530001, Guangxi, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Semantics; Hash functions; Encoding; Representation learning; Convolutional neural networks; Quantization (signal); Kernel; Deep hashing; micro-video retrieval; multi-label; multi-modality; MAXIMUM-LIKELIHOOD; QUANTIZATION;

D O I：

10.1109/TKDE.2023.3337077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The pressing need for low storage and high efficiency has significantly propelled the advancement of deep hashing techniques in the realm of large-scale search and retrieval tasks. As one of the most prevailing forms of user-generated contents, micro-videos usually represent more complicated multi-modal behaviors that are further challenged in multi-label retrieval. Existing multi-modal hashing methods tend to prioritize the complementarity and consistency in multi-modal fusion, while neglecting the completeness problem. In this paper, we propose a deep multi-modal hashing with semantic enhancement (DMHSE) method that effectively integrates complete multi-modal representation learning with discriminative binary coding by means of collaboration between two distinct encoders, FoldCoder and HashCoder. FoldCoder translates latent multi-modal representation learning to a degradation process through mimicking data transmitting. Further, it incorporates a prompt learning paradigm to maximize the utilization of multi-label semantics for guiding representation learning. HashCoder combines pairwise and central constraints to ensure more discriminative hashing results. Pairwise constraint preserves the original local relevance structure, while central constraint tackles the problem of semantic ambiguity in multi-label data by leveraging the global label distribution. Experimental results demonstrate that DMHSE achieves superior performance in multi-label micro-video retrieval tasks.

引用

页码：5080 / 5091

页数：12

共 50 条

[31] A Semantic-Preserving Deep Hashing Model for Multi-Label Remote Sensing Image Retrieval
Cheng, Qimin
Huang, Haiyan
Ye, Lan
Fu, Peng
Gan, Deqiao
Zhou, Yuzhuo
REMOTE SENSING, 2021, 13 (24)
[32] Bit-aware Semantic Transformer Hashing for Multi-modal Retrieval
Tan, Wentao
Zhu, Lei
Guan, Weili
Li, Jingjing
Cheng, Zhiyong
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 982 - 991
[33] Multimodal Progressive Modulation Network for Micro-Video Multi-Label Classification
Jing, Peiguang
Zhao, Xuan
Fan, Fugui
Yang, Fan
Li, Yun
Su, Yuting
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10134 - 10144
[34] Multi-label enhancement based self-supervised deep cross-modal hashing
Zou, Xitao
Wu, Song
Bakker, Erwin M.
Wang, Xinzhi
Neurocomputing, 2022, 467 : 138 - 162
[35] Multi-label enhancement based self-supervised deep cross-modal hashing
Zou, Xitao
Wu, Song
Bakker, Erwin M.
Wang, Xinzhi
NEUROCOMPUTING, 2022, 467 : 138 - 162
[36] Deep Co-Image-Label Hashing for Multi-Label Image Retrieval
Shen, Xiaobo
Dong, Guohua
Zheng, Yuhui
Lan, Long
Tsang, Ivor
Sun, Quan-Sen
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1116 - 1126
[37] Deep Co-Image-Label Hashing for Multi-Label Image Retrieval
Shen, Xiaobo
Dong, Guohua
Zheng, Yuhui
Lan, Long
Tsang, Ivor
Sun, Quan-Sen
IEEE Transactions on Multimedia, 2022, 24 : 1116 - 1126
[38] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
Yu, Jun
Huang, Wei
Li, Zuhe
Shu, Zhenqiu
Zhu, Liang
DIGITAL SIGNAL PROCESSING, 2022, 130
[39] Multi-Label Deep Sparse Hashing
Liong, Venice Erin
Lu, Jiwen
Tan, Yap-Peng
2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
[40] Deep adversarial multi-label cross-modal hashing algorithm
Xiaohan Yang
Zhen Wang
Wenhao Liu
Xinyi Chang
Nannan Wu
International Journal of Multimedia Information Retrieval, 2023, 12

← 1 2 3 4 5 →