Relation with Free Objects for Action Recognition

被引:2
作者
Liang, Shuang [1 ]
Ma, Wentao [1 ]
Xie, Chi [1 ]
机构
[1] Tongji Univ, 4800 Caoan Rd, Shanghai 201800, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Action recognition; relation; object detection;
D O I
10.1145/3617596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Relevant objects are widely used for aiding human action recognition in still images. Such objects are founded by a dedicated and pre-trained object detector in all previous methods. Such methods have two drawbacks. First, training an object detector requires intensive data annotation. This is costly and sometimes unaffordable in practice. Second, the relation between objects and humans are not fully taken into account in training. This work proposes a systematic approach to address the two problems. We propose two novel network modules. The first is an object extraction module that automatically finds relevant objects for action recognition, without requiring annotations. Thus, it is free. The second is a human-object relation module that models the pairwise relation between humans and objects, and enhances their features. Both modules are trained in the action recognition network, end-to-end. Comprehensive experiments and ablation studies on three datasets for action recognition in still images demonstrate the effectiveness of the proposed approach. Our method yields state-of-the-art results. Specifically, on the HICO dataset, it achieves 44.9% mAP, which is 12% relative improvement over the previous best result. In addition, this work makes an observational contribution that it is no longer necessary to rely on a pre-trained object detector for this task. Relevant objects can be found via end-to-end learning with only action labels. This is encouraging for action recognition in the wild. Models and code will be released.
引用
收藏
页数:19
相关论文
共 47 条
  • [1] Advances in human action recognition: an updated survey
    Abu-Bakar, Syed A. R.
    [J]. IET IMAGE PROCESSING, 2019, 13 (13) : 2381 - 2394
  • [2] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
    Andriluka, Mykhaylo
    Pishchulin, Leonid
    Gehler, Peter
    Schiele, Bernt
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
  • [3] Still image action recognition based on interactions between joints and objects
    Ashrafi, Seyed Sajad
    Shokouhi, Shahriar B.
    Ayatollahi, Ahmad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (17) : 25945 - 25971
  • [4] Learning to Detect Human-Object Interactions
    Chao, Yu-Wei
    Liu, Yunfan
    Liu, Xieyang
    Zeng, Huayi
    Deng, Jia
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 381 - 389
  • [5] HICO: A Benchmark for Recognizing Human-Object Interactions in Images
    Chao, Yu-Wei
    Wang, Zhan
    He, Yugeng
    Wang, Jiaxuan
    Deng, Jia
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1017 - 1025
  • [6] Chen Gao, 2018, P BRIT MACH VIS C
  • [7] Chen TQ, 2015, Arxiv, DOI [arXiv:1512.01274, 10.48550/arxiv.1512.01274]
  • [8] Multi-expert human action recognition with hierarchical super-class learning
    Dehkordi, Hojat Asgarian
    Nezhad, Ali Soltani
    Kashiani, Hossein
    Shokouhi, Shahriar Baradaran
    Ayatollahi, Ahmad
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 250
  • [9] The Pascal Visual Object Classes (VOC) Challenge
    Everingham, Mark
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
  • [10] Pairwise Body-Part Attention for Recognizing Human-Object Interactions
    Fang, Hao-Shu
    Cao, Jinkun
    Tai, Yu-Wing
    Lu, Cewu
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 52 - 68