MMM-GCN: Multi-Level Multi-Modal Graph Convolution Network for Video-Based Person Identification

被引：0

作者：

Liao, Ziyan ^{[1
]}

Di, Dening ^{[1
]}

Hao, Jingsong ^{[1
]}

Zhang, Jiang ^{[1
]}

Zhu, Shulei ^{[1
]}

Yin, Jun ^{[1
]}

机构：

[1] Dahua Technol Co Ltd, Hangzhou, Peoples R China

来源：

MULTIMEDIA MODELING, MMM 2023, PT I | 2023年 / 13833卷

关键词：

Person identification; Multi-modal; Multi biometrics; GCN; Feature fusion;

D O I：

10.1007/978-3-031-27077-2_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video-based multi-modal person identification has attracted rising research interest recently to address the inadequacies of single-modal identification in unconstrained scenes. Most existing methods model video-level and multi-modal-level information of target video respectively, which suffer from separation of different levels and insufficient information contained in a specific video. In this paper, we introduce extra neighbor-level information for the first time to enhance the informativeness of target video. Then a Multi-Level(neighbor-level, multi-modal-level, and video-level) and Multi-Modal GCN model is proposed, to capture correlation among different levels and achieve adaptive fusion in a unified model. Experiments on iQIYI-VID-2019 dataset show that MMM-GCN significantly outperforms current state-of-the-art methods, proving its superiority and effectiveness. Besides, we point out feature fusion is heavily polluted by noisy nodes that result in a suboptimal result. Further improvement could be explored on this basis to approach the performance upper bound of our paradigm.

引用

页码：3 / 15

页数：13

共 50 条

[1] MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video
Wei, Yinwei
Wang, Xiang
Nie, Liqiang
He, Xiangnan
Hong, Richang
Chua, Tat-Seng
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1437 - 1445
[2] Multi-level Interaction Network for Multi-Modal Rumor Detection
Zou, Ting
Qian, Zhong
Li, Peifeng
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[3] MRCap: Multi-modal and Multi-level Relationship-based Dense Video Captioning
Chen, Wei
Niu, Jianwei
Liu, Xuefeng
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2615 - 2620
[4] Cascade Graph Convolution Network Based on Multi-level Graph Structures in Heterogeneous Graph
Song, Ling-Yun
Liu, Zhi-Zhen
Zhang, Yang
Li, Zhan-Huai
Shang, Xue-Qun
Ruan Jian Xue Bao/Journal of Software, 2024, 35 (11): : 5179 - 5195
[5] Multi-Modal Graph Interaction for Multi-Graph Convolution Network in Urban Spatiotemporal Forecasting
Zhang, Lingyu
Geng, Xu
Qin, Zhiwei
Wang, Hongjun
Wang, Xiao
Zhang, Ying
Liang, Jian
Wu, Guobin
Song, Xuan
Wang, Yunhai
SUSTAINABILITY, 2022, 14 (19)
[6] Multi-level fusion network for mild cognitive impairment identification using multi-modal neuroimages
Xu, Haozhe
Zhong, Shengzhou
Zhang, Yu
PHYSICS IN MEDICINE AND BIOLOGY, 2023, 68 (09):
[7] A sparse graph wavelet convolution neural network for video-based person re-identification
Yao, Yingmao
Jiang, Xiaoyan
Fujita, Hamido
Fang, Zhijun
PATTERN RECOGNITION, 2022, 129
[8] MBIAN: Multi-level bilateral interactive attention network for multi-modal
Sun, Kai
Zhang, Jiangshe
Wang, Jialin
Xu, Shuang
Zhang, Chunxia
Hu, Junying
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
[9] CMGNet: Collaborative multi-modal graph network for video captioning
Rao, Qi
Yu, Xin
Li, Guang
Zhu, Linchao
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
[10] MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion
Zhai, Hanming
Lv, Xiaojun
Hou, Zhiwen
Tong, Xin
Bu, Fanliang
MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14096 - 14116

← 1 2 3 4 5 →