Multi-head attention-based two-stream EfficientNet for action recognition

被引：21

作者：

Zhou, Aihua ^{[1
,2
]}

Ma, Yujun ^{[3
]}

Ji, Wanting ^{[4
]}

Zong, Ming ^{[5
]}

Yang, Pei ^{[1
,2
]}

Wu, Min ^{[6
]}

Liu, Mingzhe ^{[7
]}

机构：

[1] State Grid Smart Grid Res Inst CO LTD, Beijing, Peoples R China

[2] State Grid Key Lab Informat & Network Secur, Nanjing, Peoples R China

[3] Massey Univ, Sch Math & Computat Sci, Auckland, New Zealand

[4] Liaoning Univ, Sch Informat, Shenyang, Peoples R China

[5] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China

[6] Bejing Inst Comp Technol & Applicat, Beijing, Peoples R China

[7] Chengdu Univ Technol, State Key Lab Geohazard Prevent & Geoenvironm Pro, Chengdu, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2023年 / 29卷 / 02期

关键词：

Action recognition; Multi-head attention; Two-stream network; SPATIAL-TEMPORAL ATTENTION; U-NET; NETWORK; SEGMENTATION; KNOWLEDGE;

D O I：

10.1007/s00530-022-00961-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent years have witnessed the popularity of using two-stream convolutional neural networks for action recognition. However, existing two-stream convolutional neural network-based action recognition approaches are incapable of distinguishing some roughly similar actions in videos such as sneezing and yawning. To solve this problem, we propose a Multi-head Attention-based Two-stream EfficientNet (MAT-EffNet) for action recognition, which can take advantage of the efficient feature extraction of EfficientNet. The proposed network consists of two streams (i.e., a spatial stream and a temporal stream), which first extract the spatial and temporal features from consecutive frames by using EfficientNet. Then, a multi-head attention mechanism is utilized on the two streams to capture the key action information from the extracted features. The final prediction is obtained via a late average fusion, which averages the softmax score of spatial and temporal streams. The proposed MAT-EffNet can focus on the key action information at different frames and compute the attention multiple times, in parallel, to distinguish similar actions. We test the proposed network on the UCF101, HMDB51 and Kinetics-400 datasets. Experimental results show that the MAT-EffNet outperforms other state-of-the-art approaches for action recognition.

引用

页码：487 / 498

页数：12

共 50 条

[1] Multi-head attention-based two-stream EfficientNet for action recognition
Aihua Zhou
Yujun Ma
Wanting Ji
Ming Zong
Pei Yang
Min Wu
Mingzhe Liu
Multimedia Systems, 2023, 29 : 487 - 498
[2] Cascade multi-head attention networks for action recognition
Wang, Jiaze
Peng, Xiaojiang
Qiao, Yu
COMPUTER VISION AND IMAGE UNDERSTANDING, 2020, 192
[3] Multi-Head Attention-Based Spectrum Sensing for Radio
Devarakonda, B. V. Ravisankar
Nandanavam, Venkateswararao
INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2023, 14 (02) : 135 - 143
[4] Human action recognition using two-stream attention based LSTM networks
Dai, Cheng
Liu, Xingang
Lai, Jinfeng
APPLIED SOFT COMPUTING, 2020, 86
[5] Two-stream Graph Attention Convolutional for Video Action Recognition
Zhang, Deyuan
Gao, Hongwei
Dai, Hailong
Shi, Xiangbin
2021 IEEE 15TH INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (BIGDATASE 2021), 2021, : 23 - 27
[6] Multiscaled Multi-Head Attention-Based Video Transformer Network for Hand Gesture Recognition
Garg, Mallika
Ghosh, Debashis
Pradhan, Pyari Mohan
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 80 - 84
[7] A novel two-stream multi-head self-attention convolutional neural network for bearing fault diagnosis
Ren, Hang
Liu, Shaogang
Wei, Fengmei
Qiu, Bo
Zhao, Dan
PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2024, 238 (11) : 5393 - 5405
[8] Improving CRNN with EfficientNet-like feature extractor and multi-head attention for text recognition
Dinh Viet Sang
Le Tran Bao Cuong
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 285 - 290
[9] A multi-head adjacent attention-based pyramid layered model for nested named entity recognition
Shengmin Cui
Inwhee Joe
Neural Computing and Applications, 2023, 35 : 2561 - 2574
[10] Two-Stream Adaptive Attention Graph Convolutional Networks for Action Recognition
Du Q.
Xiang Z.
Tian L.
Yu L.
Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (12): : 20 - 29

← 1 2 3 4 5 →