Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition

被引：2

作者：

Ren, ZiLiang ^{[1
,2
]}

Zhang, QieShi ^{[2
]}

Cheng, Qin ^{[2
,3
]}

Xu, ZhenYu ^{[2
]}

Yuan, Shuai ^{[4
]}

Luo, DeLin ^{[5
]}

机构：

[1] Dongguan Univ Technol, Sch Comp Sci & Technol, Dongguan 523808, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

[3] Guilin Univ Elect Technol, Sch Elect Engn & Automat, Guilin 541004, Peoples R China

[4] Erlangen Nuremberg Univ, Dept Mech Engn, D-91508 Erlangen, Germany

[5] Xiamen Univ, Sch Aerosp Engn, Xiamen 361102, Peoples R China

来源：

SCIENCE CHINA-TECHNOLOGICAL SCIENCES | 2024年 / 67卷 / 01期

基金：

中国国家自然科学基金;

关键词：

action recognition; segment frame difference aggregation; supervised compensation learning; ConvNets; CONVOLUTION NEURAL-NETWORKS; RGB-D; ATTENTION;

D O I：

10.1007/s11431-023-2491-4

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

With more multi-modal data available for visual classification tasks, human action recognition has become an increasingly attractive topic. However, one of the main challenges is to effectively extract complementary features from different modalities for action recognition. In this work, a novel multimodal supervised learning framework based on convolution neural networks (ConvNets) is proposed to facilitate extracting the compensation features from different modalities for human action recognition. Built on information aggregation mechanism and deep ConvNets, our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module (FDA-STM), that the networks bridges information from skeleton data through a multimodal supervised compensation block (SCB) to supervise the extraction of compensation features. We evaluate the proposed recognition framework on three human action datasets, including NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD. The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets.

引用

页码：197 / 208

页数：12

共 56 条

[1] Dynamic Image Networks for Action Recognition [J].

Bilen, Hakan ;

Fernando, Basura ;

Gavves, Efstratios ;

Vedaldi, Andrea ;

Gould, Stephen .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3034-3042

[2] Convolution without multiplication: A general speed up strategy for CNNs [J].

Cai GuoRong ;

Yang ShengMing ;

Du Jing ;

Wang ZongYue ;

Huang Bin ;

Guan Yin ;

Su SongJian ;

Su JinHe ;

Su SongZhi .

SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2021, 64 (12) :2627-2639

[3] Visual information quantification for object recognition and retrieval [J].

Cheng JiaLiang ;

Bie Lin ;

Zhao XiBin ;

Gao Yue .

SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2021, 64 (12) :2618-2626

[4] Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition [J].

Cheng, Jun ;

Ren, Ziliang ;

Zhang, Qieshi ;

Gao, Xiangyang ;

Hao, Fusheng .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) :1498-1509

[5] Non-Local Neural Networks with Grouped Bilinear Attentional Transforms [J].

Chi, Lu ;

Yuan, Zehuan ;

Mu, Yadong ;

Wang, Changhu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11801-11810

[6] VPN: Learning Video-Pose Embedding for Activities of Daily Living [J].

Das, Srijan ;

Sharma, Saurav ;

Dai, Rui ;

Bremond, Francois ;

Thonnat, Monique .

COMPUTER VISION - ECCV 2020, PT IX, 2020, 12354 :72-90

[7] Toyota Smarthome: Real-World Activities of Daily Living [J].

Das, Srijan ;

Dai, Rui ;

Koperski, Michal ;

Minciullo, Luca ;

Garattoni, Lorenzo ;

Bremond, Francois ;

Francesca, Gianpiero .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :833-842

[8] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[9] Understanding the Gap between 2D and 3D Skeleton-Based Action Recognition [J].

Elias, Petr ;

Sedmidubsky, Jan ;

Zezula, Pavel .

2019 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2019), 2019, :192-195

[10] Rank Pooling for Action Recognition [J].

Fernando, Basura ;

Gavves, Efstratios ;

Oramas, Jose M. ;

Ghodrati, Amir ;

Tuytelaars, Tinne .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :773-787

← 1 2 3 4 5 6 →