Still image action recognition based on interactions between joints and objects

被引:4
作者
Ashrafi, Seyed Sajad [1 ]
Shokouhi, Shahriar B. [1 ]
Ayatollahi, Ahmad [1 ]
机构
[1] Iran Univ Sci & Technol IUST, Elect Engn Dept, Tehran, Iran
关键词
Still image-based action recognition; Self-attention; Cross-attention; Convolutional neural networks (CNN); Atrous spatial pyramid pooling (ASPP);
D O I
10.1007/s11042-023-14350-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Still image-based action recognition is a challenging area in which recognition is performed based on only a single input image. Utilizing auxiliary information such as pose, object, or background is one of the common techniques in this field. However, the simultaneous use of several auxiliary components and their optimal combinations is less studied. In this study, two cues of body joints and objects have been employed simultaneously, and an attention module is proposed to combine the features of these two components. The attention module consists of two self-attentions and a cross-attention, which are designed to account for the interaction between the objects, between the joints, and between the joints and objects, respectively. In addition, the Multi-scale Atrous Spatial Pyramid Pooling (MASPP) module is proposed to reduce the number of parameters of the proposed method and at the same time, combine the features obtained from different levels of the backbone. The Joint Object Pooling (JOPool) module is proposed to extract local features from joints and objects regions. ResNets are used as the backbone, and the stride of the last two layers is changed. Experimental results on different datasets show that the combination of several auxiliary components can be effective in increasing the mean Average Precision (mAP) of recognition. The proposed method is evaluated on three important datasets: Stanford-40, PASCAL VOC 2012, and BU101PLUS resulting in 94.84%, 93.20%, and 91.25% mAPs, respectively. The obtained mAPs are higher than the best preceding proposed methods.
引用
收藏
页码:25945 / 25971
页数:27
相关论文
共 54 条
  • [1] Akti S, 2021, PROC 2022 IEEECVF WI, P550, DOI [10.48550/arxiv.2111.08370, DOI 10.48550/ARXIV.2111.08370]
  • [2] Action recognition in still images using a multi-attention guided network with weakly supervised saliency detection
    Ashrafi, Seyed Sajad
    Shokouhi, Shahriar B.
    Ayatollahi, Ahmad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (21-23) : 32567 - 32593
  • [3] Vision-based human activity recognition: a survey
    Beddiar, Djamila Romaissa
    Nini, Brahim
    Sabokrou, Mohammad
    Hadid, Abdenour
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 30509 - 30555
  • [4] Skeleton-based action recognition with temporal action graph and temporal adaptive graph convolution structure
    Cao, Yi
    Liu, Chen
    Huang, Zilong
    Sheng, Yongjian
    Ju, Yongjian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (19) : 29139 - 29162
  • [5] Transfer learning with fine tuning for human action recognition from still images
    Chakraborty, Saikat
    Mondal, Riktim
    Singh, Pawan Kumar
    Sarkar, Ram
    Bhattacharjee, Debotosh
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 20547 - 20578
  • [6] Chapariniya Masoumeh, 2020, Proceedings of the 10th International Conference on Computer and Knowledge Engineering (ICCKE 2020), P274, DOI 10.1109/ICCKE50421.2020.9303716
  • [7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [8] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [9] Object Detection Based on Mult-Layer Convolution Feature Fusion and Online Hard Example Mining
    Chu, Jun
    Guo, Zhixian
    Leng, Lu
    [J]. IEEE ACCESS, 2018, 6 : 19959 - 19967
  • [10] Dehkordi Hojat Asgarian, 2021, 2021 7th International Conference on Web Research (ICWR), P125, DOI 10.1109/ICWR51868.2021.9443021