Spatiotemporal Self-Attention Mechanism Driven by 3D Pose to Guide RGB Cues for Daily Living Human Activity Recognition

被引：2

作者：

Basly, Hend ^{[1
]}

Zayene, Mohamed Amine ^{[1
]}

Sayadi, Fatma Ezahra ^{[1
]}

机构：

[1] Natl Engn Sch Sousse ENISO, NOCCS Labb Networked Objects Control & Commun Syst, BP 264, Erriadh 4023, Sousse, Tunisia

来源：

JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS | 2023年 / 109卷 / 01期

基金：

英国科研创新办公室;

关键词：

Transformer; Self-attention mechanism; Daily living activity recognition; Bilinear pooling attention; NETWORK;

D O I：

10.1007/s10846-023-01926-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The field of human activity recognition is evolving at a quick pace. Indeed, over the last two decades, several approaches have been proposed to recognize human activities from generic videos, but still limited for daily living videos which have more characteristics that make them much more complex to manage. In fact, they present several challenges to overcome, such as; camera view variations, time information representation, inter-class variation between similar actions, fine-grained actions representation and high intra-class variation. Generally, the recognition of the action requires the extraction of spatial and temporal information in the videos. To extract temporal information, several works based on the LSTM network have been published. Although, they have proven their great potential in this field, they fail to model long range temporal information in very long video sequences. We have hence thought of using Transformer networks to propose a new pose-guided self-attention mechanism combined to 3D convolutional neural networks (3D CNN) by a Bilinear Pooling Attention module (BPA) which allows the spatial-temporal skeleton features to recalibrate the RGB features for Daily Living Activity (DLA) recognition. In addition, the majority of the implemented datasets are static and do not show strong variations in movement over time. We then thought of going towards a large-scale dataset called NTU RGB+D, since it contains RGB-D human actions that evolve much more over time. The Experimental results demonstrate that our Spatial Temporal Self Attention mechanism combined to 3D CNN through BPA module (ST-SA-BPA) outperforms state-of-the-art methods in terms of performance.

引用

页数：14

共 64 条

[1] Araei S., 2021, 26 INT COMP C COMP S, P1
[2] Baradel F, 2018, BMVC 2018 29 BRIT MA, P1
[3] Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
Baradel, Fabien
Wolf, Christian
Mille, Julien
Taylor, Graham W.
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 469 - 478
[4] Human Action Recognition: Pose-based Attention draws focus to Hands
Baradel, Fabien
Wolf, Christian
Mille, Julien
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 604 - 613
[5] Basly Hend, 2020, Image and Signal Processing. 9th International Conference, ICISP 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12119), P271, DOI 10.1007/978-3-030-51935-3_29
[6] LAHAR-CNN: human activity recognition from one image using convolutional neural network learning approach
Basly, Hend
Ouarda, Wael
Sayadi, Fatma Ezahra
Ouni, Bouraoui
Alimi, Adel M.
[J]. INTERNATIONAL JOURNAL OF BIOMETRICS, 2021, 13 (04) : 385 - 408
[7] DTR-HAR: deep temporal residual representation for human activity recognition
Basly, Hend
Ouarda, Wael
Sayadi, Fatma Ezahra
Ouni, Bouraoui
Alimi, Adel M.
[J]. VISUAL COMPUTER, 2022, 38 (03) : 993 - 1013
[8] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Carreira, Joao
Zisserman, Andrew
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
[9] A Semisupervised Recurrent Convolutional Attention Model for Human Activity Recognition
Chen, Kaixuan
Yao, Lina
Zhang, Dalin
Wang, Xianzhi
Chang, Xiaojun
Nie, Feiping
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (05) : 1747 - 1756
[10] P-CNN: Pose-based CNN Features for Action Recognition
Cheron, Guilhem
Laptev, Ivan
Schmid, Cordelia
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 3218 - 3226

← 1 2 3 4 5 6 7 →