Human Action Recognition Network Based on Improved Channel Attention Mechanism

被引：4

作者：

Chen Ying ^{[1
]}

Gong Suming ^{[1
]}

机构：

[1] Jiangnan Univ, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Jiangsu, Peoples R China

来源：

JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY | 2021年 / 43卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Action recognition; Channel attention; Spatiotemporal feature; Depth-wise-Separable(DS) convolution;

D O I：

10.11999/JEIT200431

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To tackle the problem that the existing channel attention mechanism uses global average pooling to generate channel-wise statistics while ignoring its local spatial information, two improved channel attention modules are proposed for human action recognition, namely the Spatial-Temporal (ST) interaction block of matrix operation and the Depth-wise-Separable (DS) block. The ST block extracts the spatiotemporal weighted information sequence of each channel through convolution and dimension conversion operations, and obtains the attention weight of each channel through convolution. The DS block uses firstly depth-wise separable convolution to obtain local spatial information of each channel, then compresses the channel size to make it have a global receptive field. The attention weight of each channel is obtained via convolution operation, which completes feature re-calibration with the channel attention mechanism. The proposed attention block is inserted into the basic network and experimented over the popular UCF101 and HDBM51 datasets, and the results show that the accuracy is improved.

引用

页码：3538 / 3545

页数：8

共 25 条

[1]

[Anonymous], 2016, LECT NOTES COMP VIII

[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[3]

Diba A., 2017, Temporal 3d convnets: New architecture and transfer learning for video classification

[4] Deep Temporal Linear Encoding Networks [J].

Diba, Ali ;

Sharma, Vivek ;

Van Gool, Luc .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1541-1550

[5] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[8]

[胡正平 Hu Zhengping], 2020, [电子学报, Acta Electronica Sinica], V48, P1261

[9]

Ikizler-Cinbis N, 2010, LECT NOTES COMPUT SC, V6311, P494, DOI 10.1007/978-3-642-15549-9_36

[10]

Ioffe S, 2015, PR MACH LEARN RES, V37, P448

← 1 2 3 →