Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition

被引：56

作者：

Zang, Jinliang ^{[1
]}

Wang, Le ^{[1
]}

Liu, Ziyi ^{[1
]}

Zhang, Qilin ^{[2
]}

Niu, Zhenxing

Hua, Gang ^{[3
]}

Zheng, Nanning ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Xian 710049, Shaanxi, Peoples R China

[2] HERE Technol, Chicago, IL 60606 USA

[3] Microsoft Res, Redmond, WA 98052 USA

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018 | 2018年 / 519卷

基金：

中国博士后科学基金;

关键词：

Action recognition; Attention model; Convolutional neural networks; Video-level prediction; Temporal weighting;

D O I：

10.1007/978-3-319-92007-8_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). However, effective and efficient methods for incorporation of temporal information into CNNs are still being actively explored in the recent literature. Motivated by the popular recurrent attention models in the research area of natural language processing, we propose the Attention-based Temporal Weighted CNN (ATW), which embeds a visual attention model into a temporal weighted multi-stream CNN. This attention model is simply implemented as temporal weighting yet it effectively boosts the recognition performance of video representations. Besides, each stream in the proposed ATW frame- work is capable of end-to-end training, with both network parameters and temporal weights optimized by stochastic gradient descent (SGD) with back-propagation. Our experiments show that the proposed attention mechanism contributes substantially to the performance gains with the more discriminative snippets by focusing on more relevant video segments.

引用

页码：97 / 108

页数：12

共 50 条

[21] Attention-Based Multiscale Spatial-Temporal Convolutional Network for Motor Imagery EEG Decoding [J].

Zhang, Yu ;

Li, Penghai ;

Cheng, Longlong ;

Li, Mingji ;

Li, Hongji .

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) :2423-2434

[22] An Attention Enhanced Spatial-Temporal Graph Convolutional LSTM Network for Action Recognition in Karate [J].

Guo, Jianping ;

Liu, Hong ;

Li, Xi ;

Xu, Dahong ;

Zhang, Yihan .

APPLIED SCIENCES-BASEL, 2021, 11 (18)

[23] Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition [J].

Fan, Zhaoxuan ;

Zhao, Xu ;

Lin, Tianwei ;

Su, Haisheng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) :363-374

[24] Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition [J].

Cheng, Qin ;

Cheng, Jun ;

Ren, Ziliang ;

Zhang, Qieshi ;

Liu, Jianming .

PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) :1303-1315

[25] Attention-Based Parallel Multiscale Convolutional Neural Network for Visual Evoked Potentials EEG Classification [J].

Gao, Zhongke ;

Sun, Xinlin ;

Liu, Mingxu ;

Dang, Weidong ;

Ma, Chao ;

Chen, Guanrong .

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (08) :2887-2894

[26] Quality Prediction Modeling for Industrial Processes Using Multiscale Attention-Based Convolutional Neural Network [J].

Yuan, Xiaofeng ;

Huang, Lingfeng ;

Ye, Lingjian ;

Wang, Yalin ;

Wang, Kai ;

Yang, Chunhua ;

Gui, Weihua ;

Shen, Feifan .

IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (05) :2696-2707

[27] SCNN: SEQUENTIAL CONVOLUTIONAL NEURAL NETWORK FOR HUMAN ACTION RECOGNITION IN VIDEOS [J].

Yang, Hao ;

Yuan, Chunfeng ;

Xing, Junliang ;

Hu, Weiming .

2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, :355-359

[28] Deep-Aligned Convolutional Neural Network for Skeleton-Based Action Recognition and Segmentation [J].

Hosseini, Babak ;

Montagne, Romain ;

Hammer, Barbara .

DATA SCIENCE AND ENGINEERING, 2020, 5 (02) :126-139

[29] Deep-Aligned Convolutional Neural Network for Skeleton-Based Action Recognition and Segmentation [J].

Babak Hosseini ;

Romain Montagne ;

Barbara Hammer .

Data Science and Engineering, 2020, 5 :126-139

[30] STA-CNN: Convolutional Spatial-Temporal Attention Learning for Action Recognition [J].

Yang, Hao ;

Yuan, Chunfeng ;

Zhang, Li ;

Sun, Yunda ;

Hu, Weiming ;

Maybank, Stephen J. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :5783-5793

← 1 2 3 4 5 →