Spatiotemporal distilled dense-connectivity network for video action recognition

被引：41

作者：

Hao, Wangli ^{[1
,3
]}

Zhang, Zhaoxiang ^{[1
,2
,3
]}

机构：

[1] Chinese Acad Sci CASIA Beijing, Inst Automat, CRIPAC, NLPR, Beijing 100190, Peoples R China

[2] Ctr Excellence Brain Sci & Intelligence Technol C, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci UCAS Beijing, Beijing 100190, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 92卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Two-stream; Action recognition; Dense-connectivity; Knowledge distillation;

D O I：

10.1016/j.patcog.2019.03.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Two-stream convolutional neural networks show great promise for action recognition tasks. However, most two-stream based approaches train the appearance and motion subnetworks independently, which may lead to the decline in performance due to the lack of interactions among two streams. To overcome this limitation, we propose a Spatiotemporal Distilled Dense-Connectivity Network (STDDCN) for video action recognition. This network implements both knowledge distillation and dense-connectivity (adapted from DenseNet). Using this STDDCN architecture, we aim to explore interaction strategies between appearance and motion streams along different hierarchies. Specifically, block-level dense connections between appearance and motion pathways enable spatiotemporal interaction at the feature representation layers. Moreover, knowledge distillation among two streams (each treated as a student) and their last fusion (treated as teacher) allows both streams to interact at the high level layers. The special architecture of STDDCN allows it to gradually obtain effective hierarchical spatiotemporal features. Moreover, it can be trained end-to-end. Finally, numerous ablation studies validate the effectiveness and generalization of our model on two benchmark datasets, including UCF101 and HMDB51. Simultaneously, our model achieves promising performances. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：13 / 24

页数：12

共 50 条

[41] Video action recognition method based on attention residual network and LSTM
Zhang, Yu
Dong, Pengyue
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3611 - 3616
[42] MNv3-MFAE: A Lightweight Network for Video Action Recognition
Liu, Jie
Liu, Wenyue
Han, Ke
ELECTRONICS, 2025, 14 (05):
[43] CANet: Comprehensive Attention Network for video-based action recognition
Gao, Xiong
Chang, Zhaobin
Ran, Xingcheng
Lu, Yonggang
KNOWLEDGE-BASED SYSTEMS, 2024, 296
[44] MLENet: Multi-Level Extraction Network for video action recognition
Wang, Fan
Li, Xinke
Xiong, Han
Mo, Haofan
Li, Yongming
PATTERN RECOGNITION, 2024, 154
[45] Exploiting Spatiotemporal Features for Action Recognition
Bin Muslim, Usairam
Khan, Muhammad Hassan
Farid, Muhammad Shahid
PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, : 613 - 619
[46] Recurrent Region Attention and Video Frame Attention Based Video Action Recognition Network Design
Sang H.-F.
Zhao Z.-Y.
He D.-K.
Zhao, Zi-Yu (Maikuraky1022@outlook.com), 1600, Chinese Institute of Electronics (48): : 1052 - 1061
[47] Action recognition on continuous video
Chang, Y. L.
Chan, C. S.
Remagnino, P.
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (04) : 1233 - 1243
[48] Action recognition on continuous video
Y. L. Chang
C. S. Chan
P. Remagnino
Neural Computing and Applications, 2021, 33 : 1233 - 1243
[49] Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition
Wang, Mengmeng
Xing, Jiazheng
Su, Jing
Chen, Jun
Liu, Yong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3347 - 3362
[50] An efficient video transformer network with token discard and keyframe enhancement for action recognition
Zhang, Qian
Yang, Zuosui
Shao, Mingwen
Liang, Hong
JOURNAL OF SUPERCOMPUTING, 2025, 81 (02)

← 1 2 3 4 5 →