Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation

被引：4

作者：

Hao, Yanbin ^{[1
]}

Duan, Jingru ^{[1
]}

Zhang, Hao ^{[2
]}

Zhu, Bin ^{[3
]}

Zhou, Pengyuan ^{[1
]}

He, Xiangnan ^{[1
]}

机构：

[1] Univ Sci & Technol China, Langfang, Peoples R China

[2] Singapore Management Univ, Singapore, Singapore

[3] Univ Bristol, Bristol, England

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

Hashing; feature contextualization; unsupervised learning; video retrieval;

D O I：

10.1145/3503161.3547836

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Unsupervised video hashing typically aims to learn a compact binary vector to represent complex video content without using manual annotations. Existing unsupervised hashing methods generally suffer from incomplete exploration of various perspective dependencies (e.g., long-range and short-range) and data structures that exist in visual contents, resulting in less discriminative hash codes. In this paper, we propose a Multi-granularity Contextualized and Multi-Structure preserved Hashing (MCMSH) method, exploring multiple axial contexts for discriminative video representation generation and various structural information for unsupervised learning simultaneously. Specifically, we delicately design three self-gating modules to separately model three granularities of dependencies (i.e., long/middle/short-range dependencies) and densely integrate them into MLP-Mixer for feature contextualization, leading to a novel model MC-MLP. To facilitate unsupervised learning, we investigate three kinds of data structures, including clusters, local neighborhood similarity structure, and inter/intra-class variations, and design a multi-objective task to train MC-MLP. These data structures show high complementarities in hash code learning. We conduct extensive experiments using three video retrieval benchmark datasets, demonstrating that our MCMSH not only boosts the performance of the backbone MLP-Mixer significantly but also outperforms the competing methods notably. Code is available at https://github.com/haoyanbin918/MCMSH.

引用

页码：3754 / 3763

页数：10

共 60 条

[1]

[Anonymous], 2021, P 29 ACM INT C MULT, DOI DOI 10.1109/COG52621.2021.9619020

[2]

[Anonymous], 2009, ANN INT C THEOR APPL

[3]

[Anonymous], 2015, P IEEE C COMP VIS PA

[4]

Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1

[5]

Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698

[6]

Chen Weihua, 2017, PROCEEDCVPR INGS IEE

[7]

Dong Y., 2018, P INT C MULT SYST SI, P12

[8]

Fang Chaowei, 2022, ARXIV220315314

[9] ADAM - A Database and Information Retrieval System for Big Multimedia Collections [J].

Giangreco, Ivan ;

Al Kabary, Ihab ;

Schuldt, Heiko .

2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, :406-413

[10]

Gu Y., 2016, P ACM INT C MULT, P272, DOI DOI 10.1145/2964284.2967225

← 1 2 3 4 5 6 →