Still image action recognition based on interactions between joints and objects

被引：4

作者：

Ashrafi, Seyed Sajad ^{[1
]}

Shokouhi, Shahriar B. ^{[1
]}

Ayatollahi, Ahmad ^{[1
]}

机构：

[1] Iran Univ Sci & Technol IUST, Elect Engn Dept, Tehran, Iran

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 17期

关键词：

Still image-based action recognition; Self-attention; Cross-attention; Convolutional neural networks (CNN); Atrous spatial pyramid pooling (ASPP);

D O I：

10.1007/s11042-023-14350-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Still image-based action recognition is a challenging area in which recognition is performed based on only a single input image. Utilizing auxiliary information such as pose, object, or background is one of the common techniques in this field. However, the simultaneous use of several auxiliary components and their optimal combinations is less studied. In this study, two cues of body joints and objects have been employed simultaneously, and an attention module is proposed to combine the features of these two components. The attention module consists of two self-attentions and a cross-attention, which are designed to account for the interaction between the objects, between the joints, and between the joints and objects, respectively. In addition, the Multi-scale Atrous Spatial Pyramid Pooling (MASPP) module is proposed to reduce the number of parameters of the proposed method and at the same time, combine the features obtained from different levels of the backbone. The Joint Object Pooling (JOPool) module is proposed to extract local features from joints and objects regions. ResNets are used as the backbone, and the stride of the last two layers is changed. Experimental results on different datasets show that the combination of several auxiliary components can be effective in increasing the mean Average Precision (mAP) of recognition. The proposed method is evaluated on three important datasets: Stanford-40, PASCAL VOC 2012, and BU101PLUS resulting in 94.84%, 93.20%, and 91.25% mAPs, respectively. The obtained mAPs are higher than the best preceding proposed methods.

引用

页码：25945 / 25971

页数：27

共 54 条

[1] Akti S, 2021, PROC 2022 IEEECVF WI, P550, DOI [10.48550/arxiv.2111.08370, DOI 10.48550/ARXIV.2111.08370]
[2] Action recognition in still images using a multi-attention guided network with weakly supervised saliency detection
Ashrafi, Seyed Sajad
Shokouhi, Shahriar B.
Ayatollahi, Ahmad
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (21-23) : 32567 - 32593
[3] Vision-based human activity recognition: a survey
Beddiar, Djamila Romaissa
Nini, Brahim
Sabokrou, Mohammad
Hadid, Abdenour
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 30509 - 30555
[4] Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure
Cao, Yi
Liu, Chen
Huang, Zilong
Sheng, Yongjian
Ju, Yongjian
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (19) : 29139 - 29162
[5] Transfer learning with fine tuning for human action recognition from still images
Chakraborty, Saikat
Mondal, Riktim
Singh, Pawan Kumar
Sarkar, Ram
Bhattacharjee, Debotosh
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 20547 - 20578
[6] Chapariniya Masoumeh, 2020, Proceedings of the 10th International Conference on Computer and Knowledge Engineering (ICCKE 2020), P274, DOI 10.1109/ICCKE50421.2020.9303716
[7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Chen, Liang-Chieh
Papandreou, George
Kokkinos, Iasonas
Murphy, Kevin
Yuille, Alan L.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
[8] Xception: Deep Learning with Depthwise Separable Convolutions
Chollet, Francois
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
[9] Object Detection Based on Mult-Layer Convolution Feature Fusion and Online Hard Example Mining
Chu, Jun
Guo, Zhixian
Leng, Lu
[J]. IEEE ACCESS, 2018, 6 : 19959 - 19967
[10] Dehkordi Hojat Asgarian, 2021, 2021 7th International Conference on Web Research (ICWR), P125, DOI 10.1109/ICWR51868.2021.9443021

← 1 2 3 4 5 6 →