Relation with Free Objects for Action Recognition

被引：2

作者：

Liang, Shuang ^{[1
]}

Ma, Wentao ^{[1
]}

Xie, Chi ^{[1
]}

机构：

[1] Tongji Univ, 4800 Caoan Rd, Shanghai 201800, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 02期

基金：

上海市自然科学基金; 中国国家自然科学基金;

关键词：

Action recognition; relation; object detection;

D O I：

10.1145/3617596

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Relevant objects are widely used for aiding human action recognition in still images. Such objects are founded by a dedicated and pre-trained object detector in all previous methods. Such methods have two drawbacks. First, training an object detector requires intensive data annotation. This is costly and sometimes unaffordable in practice. Second, the relation between objects and humans are not fully taken into account in training. This work proposes a systematic approach to address the two problems. We propose two novel network modules. The first is an object extraction module that automatically finds relevant objects for action recognition, without requiring annotations. Thus, it is free. The second is a human-object relation module that models the pairwise relation between humans and objects, and enhances their features. Both modules are trained in the action recognition network, end-to-end. Comprehensive experiments and ablation studies on three datasets for action recognition in still images demonstrate the effectiveness of the proposed approach. Our method yields state-of-the-art results. Specifically, on the HICO dataset, it achieves 44.9% mAP, which is 12% relative improvement over the previous best result. In addition, this work makes an observational contribution that it is no longer necessary to rely on a pre-trained object detector for this task. Relevant objects can be found via end-to-end learning with only action labels. This is encouraging for action recognition in the wild. Models and code will be released.

引用

页数：19

共 47 条

[1] Advances in human action recognition: an updated survey
Abu-Bakar, Syed A. R.
[J]. IET IMAGE PROCESSING, 2019, 13 (13) : 2381 - 2394
[2] 2D Human Pose Estimation: New Benchmark and State of the Art Analysis
Andriluka, Mykhaylo
Pishchulin, Leonid
Gehler, Peter
Schiele, Bernt
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3686 - 3693
[3] Still image action recognition based on interactions between joints and objects
Ashrafi, Seyed Sajad
Shokouhi, Shahriar B.
Ayatollahi, Ahmad
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (17) : 25945 - 25971
[4] Learning to Detect Human-Object Interactions
Chao, Yu-Wei
Liu, Yunfan
Liu, Xieyang
Zeng, Huayi
Deng, Jia
[J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 381 - 389
[5] HICO: A Benchmark for Recognizing Human-Object Interactions in Images
Chao, Yu-Wei
Wang, Zhan
He, Yugeng
Wang, Jiaxuan
Deng, Jia
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1017 - 1025
[6] Chen Gao, 2018, P BRIT MACH VIS C
[7] Chen TQ, 2015, Arxiv, DOI [arXiv:1512.01274, 10.48550/arxiv.1512.01274]
[8] Multi-expert human action recognition with hierarchical super-class learning
Dehkordi, Hojat Asgarian
Nezhad, Ali Soltani
Kashiani, Hossein
Shokouhi, Shahriar Baradaran
Ayatollahi, Ahmad
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 250
[9] The Pascal Visual Object Classes (VOC) Challenge
Everingham, Mark
Van Gool, Luc
Williams, Christopher K. I.
Winn, John
Zisserman, Andrew
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) : 303 - 338
[10] Pairwise Body-Part Attention for Recognizing Human-Object Interactions
Fang, Hao-Shu
Cao, Jinkun
Tai, Yu-Wing
Lu, Cewu
[J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 52 - 68

← 1 2 3 4 5 →