Cross-modality online distillation for multi-view action recognition

被引:14
|
作者
Xu, Chao [1 ,2 ]
Wu, Xia [1 ,2 ]
Li, Yachun [1 ,2 ]
Jin, Yining [3 ]
Wang, Mengmeng [1 ,2 ]
Liu, Yong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金;
关键词
Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;
D O I
10.1016/j.neucom.2021.05.077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:384 / 393
页数:10
相关论文
共 50 条
  • [21] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition
    Cheng, Jun
    Ren, Ziliang
    Zhang, Qieshi
    Gao, Xiangyang
    Hao, Fusheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1498 - 1509
  • [22] Multi-View Action Recognition Method Based on Regularized Extreme Learning Machine
    He, Wei
    Liu, Bo
    Xiao, Yanshan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, : 854 - 857
  • [23] Video Action Recognition Using Motion and Multi-View Excitation with Temporal Aggregation
    Joefrie, Yuri Yudhaswana
    Aono, Masaki
    ENTROPY, 2022, 24 (11)
  • [24] MRF-based multi-view action recognition using sensor networks
    Li, Haitao
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2017, 23 (03) : 201 - 209
  • [25] Unsupervised video segmentation for multi-view daily action recognition
    Liu, Zhigang
    Wu, Yin
    Yin, Ziyang
    Gao, Chunlei
    IMAGE AND VISION COMPUTING, 2023, 134
  • [26] Discriminative Multi-View Subspace Feature Learning for Action Recognition
    Sheng, Biyun
    Li, Jun
    Xiao, Fu
    Li, Qun
    Yang, Wankou
    Han, Junwei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (12) : 4591 - 4600
  • [27] Multi-view distillation based on multi-modal fusion for few-shot action recognition (CLIP-MDMF)
    Guo, Fei
    Wang, Yikang
    Qi, Han
    Jin, Wenping
    Zhu, Li
    Sun, Jing
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [28] Cross-Modality Gesture Recognition With Complete Representation Projection
    Liu, Xiaokai
    Li, Mingyue
    Zhang, Boyi
    Hao, Luyuan
    Ma, Xiaorui
    Wang, Jie
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (09): : 16184 - 16195
  • [29] Camera-LiDAR Cross-Modality Gait Recognition
    Guo, Wenxuan
    Liang, Yingping
    Pan, Zhiyu
    Xi, Ziheng
    Feng, Jianjiang
    Zhou, Jie
    COMPUTER VISION - ECCV 2024, PT XXXIV, 2025, 15092 : 439 - 455
  • [30] Driver Action Recognition in Low-Light Conditions: A Multi-View Fusion Framework
    Abdullah, Karam
    Jegham, Imen
    Mahjoub, Mohamed Ali
    Ben Khalifa, Anouar
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES, SIGNAL AND IMAGE PROCESSING, ATSIP 2024, 2024, : 171 - 176