MMM-GCN: Multi-Level Multi-Modal Graph Convolution Network for Video-Based Person Identification

被引:0
作者
Liao, Ziyan [1 ]
Di, Dening [1 ]
Hao, Jingsong [1 ]
Zhang, Jiang [1 ]
Zhu, Shulei [1 ]
Yin, Jun [1 ]
机构
[1] Dahua Technol Co Ltd, Hangzhou, Peoples R China
来源
MULTIMEDIA MODELING, MMM 2023, PT I | 2023年 / 13833卷
关键词
Person identification; Multi-modal; Multi biometrics; GCN; Feature fusion;
D O I
10.1007/978-3-031-27077-2_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-based multi-modal person identification has attracted rising research interest recently to address the inadequacies of single-modal identification in unconstrained scenes. Most existing methods model video-level and multi-modal-level information of target video respectively, which suffer from separation of different levels and insufficient information contained in a specific video. In this paper, we introduce extra neighbor-level information for the first time to enhance the informativeness of target video. Then a Multi-Level(neighbor-level, multi-modal-level, and video-level) and Multi-Modal GCN model is proposed, to capture correlation among different levels and achieve adaptive fusion in a unified model. Experiments on iQIYI-VID-2019 dataset show that MMM-GCN significantly outperforms current state-of-the-art methods, proving its superiority and effectiveness. Besides, we point out feature fusion is heavily polluted by noisy nodes that result in a suboptimal result. Further improvement could be explored on this basis to approach the performance upper bound of our paradigm.
引用
收藏
页码:3 / 15
页数:13
相关论文
共 50 条
[21]   A Novel Deep Multi-Modal Feature Fusion Method for Celebrity Video Identification [J].
Chen, Jianrong ;
Yang, Li ;
Xu, Yuanyuan ;
Huo, Jing ;
Shi, Yinghuan ;
Gao, Yang .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :2535-2538
[22]   GRAPH-BASED MULTI-MODAL SCENE DETECTION FOR MOVIE AND TELEPLAY [J].
Xu, Su ;
Feng, Bailan ;
Ding, Peng ;
Xu, Bo .
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, :1413-1416
[23]   A multi-modal and multi-stage fusion enhancement network for segmentation based on OCT and OCTA images [J].
Quan, Xiongwen ;
Hou, Guangyao ;
Yin, Wenya ;
Zhang, Han .
INFORMATION FUSION, 2025, 113
[24]   MULTI-MODAL HIERARCHICAL ATTENTION-BASED DENSE VIDEO CAPTIONING [J].
Munusamy, Hemalatha ;
Sekhar, Chandra C. .
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, :475-479
[25]   Reserch of Multi-modal Emotion Recognition Based on Voice and Video Images [J].
Wang, Chuanyu ;
Li, Weixiang ;
Chen, Zhenhuan .
Computer Engineering and Applications, 2024, 57 (23) :163-170
[26]   Sound event detection in traffic scenes based on graph convolutional network to obtain multi-modal information [J].
Jiang, Yanji ;
Guo, Dingxu ;
Wang, Lan ;
Zhang, Haitao ;
Dong, Hao ;
Qiu, Youli ;
Zou, Huiwen .
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (04) :5653-5668
[27]   M2GCNet: Multi-Modal Graph Convolution Network for Precise Brain Tumor Segmentation Across Multiple MRI Sequences [J].
Zhou, Tongxue .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :4896-4910
[28]   Hierarchical Graph Semantic Pooling Network for Multi-modal Community Question Answer Matching [J].
Hu, Jun ;
Qian, Shengsheng ;
Fang, Quan ;
Xu, Changsheng .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :1157-1165
[29]   A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification [J].
Zhang, Shichuan ;
Tang, Zengming ;
Pan, Hao ;
Wei, Xinyu ;
Huang, Jun .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :2539-2542
[30]   A presentation attack detection network based on dynamic convolution and multi-level feature fusion with security and reliability [J].
Cheng, Xin ;
Zhou, Jingmei ;
Zhao, Xiangmo ;
Wang, Hongfei ;
Li, Yuqi .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 146 :114-121