Cross-modality online distillation for multi-view action recognition

被引:14
|
作者
Xu, Chao [1 ,2 ]
Wu, Xia [1 ,2 ]
Li, Yachun [1 ,2 ]
Jin, Yining [3 ]
Wang, Mengmeng [1 ,2 ]
Liu, Yong [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
[2] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada
基金
中国国家自然科学基金;
关键词
Multi-view; Cross-modality; Action recognition; Online distillation; MODEL; NETWORK;
D O I
10.1016/j.neucom.2021.05.077
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, some multi-modality features are introduced to the multi-view action recognition methods in order to obtain a more robust performance. However, it is intuitive that not all modalities are avail-able in real applications. For example, daily scenes lack depth modal data and capture RGB sequences only. Thus comes the challenge of learning critical features from multi-modality data at train time, while still getting robust performance based on RGB sequences at test time. To address this chal-lenge, our paper presents a novel two-stage teacher-student framework. The teacher network takes advantage of multi view geometry-and-texture features during training, while the student network is given only RGB sequences at test time. Specifically, in the first stage, Cross-modality Aggregated Transfer (CAT) network is proposed to transfer multi-view cross-modality aggregated features from the teacher network to the student network. Moreover, we design a Viewpoint-Aware Attention (VAA) module which captures discriminative information across different views to combine multi-view fea-tures effectively. In the second stage, Multi-view Features Strengthen (MFS) network with the VAA module further strengthens the global view-invariance features of the student network. Besides, both of CAT and MFS learn in an online distillation manner, so that the teacher and the student network can be trained jointly. Extensive experiments on IXMAS and Northwestern-UCLA demonstrate the effectiveness of our proposed method. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:384 / 393
页数:10
相关论文
共 50 条
  • [41] DLFace: Deep local descriptor for cross-modality face recognition
    Peng, Chunlei
    Wang, Nannan
    Li, Jie
    Gao, Xinbo
    PATTERN RECOGNITION, 2019, 90 : 161 - 171
  • [42] Modality Distillation with Multiple Stream Networks for Action Recognition
    Garcia, Nuno C.
    Morerio, Pietro
    Murino, Vittorio
    COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 106 - 121
  • [43] A Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition
    Xu, Ning
    Liu, Anan
    Nie, Weizhi
    Wong, Yongkang
    Li, Fuwu
    Su, Yuting
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1195 - 1198
  • [44] Temporal Self-Similarity for Appearance-Based Action Recognition in Multi-View Setups
    Koerner, Marco
    Denzler, Joachim
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 163 - 171
  • [45] Multi-view daily action recognition based on Hooke balanced matrix and broad learning system
    Liu, Zhigang
    Lu, Bingshuo
    Wu, Yin
    Gao, Chunlei
    IMAGE AND VISION COMPUTING, 2024, 143
  • [46] A Survey of Multi-view Gait Recognition
    Wang K.-J.
    Ding X.-N.
    Xing X.-L.
    Liu M.-C.
    Zidonghua Xuebao/Acta Automatica Sinica, 2019, 45 (05): : 841 - 852
  • [47] A View-Invariant Action Recognition Based on Multi-View Space Hidden Markov Models
    Ji, Xiaofei
    Wang, Ce
    Li, Yibo
    INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2014, 11 (01)
  • [48] Multi-view transition HMMs based view-invariant human action recognition method
    Xiaofei Ji
    Zhaojie Ju
    Ce Wang
    Changhui Wang
    Multimedia Tools and Applications, 2016, 75 : 11847 - 11864
  • [49] Multi-view transition HMMs based view-invariant human action recognition method
    Ji, Xiaofei
    Ju, Zhaojie
    Wang, Ce
    Wang, Changhui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (19) : 11847 - 11864
  • [50] Multi-view action recognition based on action volumes, fuzzy distances and cluster discriminant analysis
    Iosifidis, Alexandros
    Tefas, Anastasios
    Pitas, Ioannis
    SIGNAL PROCESSING, 2013, 93 (06) : 1445 - 1457