Cross-modality online distillation for multi-view action recognition

被引：14

作者：

Xu, Chao ^{[1
,2
]}

Wu, Xia ^{[1
,2
]}

Li, Yachun ^{[1
,2
]}

Jin, Yining ^{[3
]}

Wang, Mengmeng ^{[1
,2
]}

Liu, Yong ^{[1
,2
]}

机构：

[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China

[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China

[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada

来源：

NEUROCOMPUTING | 2021年 / 456卷

基金：

中国国家自然科学基金;

关键词：

Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;

D O I：

10.1016/j.neucom.2021.05.077

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：384 / 393

页数：10

共 50 条

[21] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition
Cheng, Jun
Ren, Ziliang
Zhang, Qieshi
Gao, Xiangyang
Hao, Fusheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1498 - 1509
[22] Multi-View Action Recognition Method Based on Regularized Extreme Learning Machine
He, Wei
Liu, Bo
Xiao, Yanshan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, : 854 - 857
[23] Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation
Joefrie, Yuri Yudhaswana
Aono, Masaki
ENTROPY, 2022, 24 (11)
[24] MRF-based multi-view action recognition using sensor networks
Li, Haitao
INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2017, 23 (03) : 201 - 209
[25] Unsupervised video segmentation for multi-view daily action recognition
Liu, Zhigang
Wu, Yin
Yin, Ziyang
Gao, Chunlei
IMAGE AND VISION COMPUTING, 2023, 134
[26] Discriminative Multi-View Subspace Feature Learning for Action Recognition
Sheng, Biyun
Li, Jun
Xiao, Fu
Li, Qun
Yang, Wankou
Han, Junwei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4591 - 4600
[27] Multi-view distillation based on multi-modal fusion for few-shot action recognition (CLIP-MDMF)
Guo, Fei
Wang, Yikang
Qi, Han
Jin, Wenping
Zhu, Li
Sun, Jing
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[28] Cross-Modality Gesture Recognition With Complete Representation Projection
Liu, Xiaokai
Li, Mingyue
Zhang, Boyi
Hao, Luyuan
Ma, Xiaorui
Wang, Jie
IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (09): : 16184 - 16195
[29] Camera-LiDAR Cross-Modality Gait Recognition
Guo, Wenxuan
Liang, Yingping
Pan, Zhiyu
Xi, Ziheng
Feng, Jianjiang
Zhou, Jie
COMPUTER VISION - ECCV 2024, PT XXXIV, 2025, 15092 : 439 - 455
[30] Driver Action Recognition in Low-Light Conditions: A Multi-View Fusion Framework
Abdullah, Karam
Jegham, Imen
Mahjoub, Mohamed Ali
Ben Khalifa, Anouar
2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 171 - 176

← 1 2 3 4 5 →