Automatically detecting human-object interaction by an instance part-level attention deep framework

被引：5

作者：

Bai, Lin ^{[1
]}

Chen, Fenglian ^{[1
]}

Tian, Yang ^{[1
]}

机构：

[1] Guangxi Univ, Sch Comp Elect & Informat, Nanning 530004, Guangxi, Peoples R China

来源：

PATTERN RECOGNITION | 2022年 / 134卷

基金：

中国国家自然科学基金;

关键词：

Human -object interaction; Instance part -level correlations; Self -attention -based model; Image context;

D O I：

10.1016/j.patcog.2022.109110

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatically detecting human-object interactions (HOIs) from an image is a very important but chal-lenging task in computer vision. One of the significant problems in HOI detection is that similar human -object interactions are difficult to distinguish. Recently, many instance-centric HOI detection schemes, based on appearance features and coarse spatial information, have been proposed. These methods, how-ever, lack the capacity of capturing and analyzing the fine-grained context between human poses and object parts, which plays a crucial role in HOI detection. To address these problems, we propose a novel instance part-level attention deep framework for HOI detection. Specifically, our approach consists of a human/object-part detection phase and an HOI detection phase. In the former phase, a part-level vi-sual pattern estimation model is designed for capturing the fine-grained human body parts and object parts. In the latter phase, a self-attention-based deep network is proposed to learn the visual compos-ite around the human-object pair that implicitly expresses the consistent spatial, scale, co-occurrence, and viewpoint relationships among human body parts and object parts across images, which are effec-tive for predicting HOI. To the best of our knowledge, we are the first to propose a framework where the fine-grained part-level mutual context of a human-object pair is extracted to improve HOI detec-tion. By comparing our approach with state-of-the-art HOI detection methods on benchmark datasets, we demonstrated that our proposed framework outperformed the existing HOI detection methods, such as significantly improving the performance of part-level visual pattern estimation, HOI detection, and the quality of the self-attention-based deep network structure.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：13

共 39 条

[1] Visual phrase recognition by modeling 3D spatial context of multiple objects
Bai, Lin
Chen, Qingfeng
[J]. NEUROCOMPUTING, 2017, 253 : 183 - 192
[2] Cross-Domain Adaptation for Animal Pose Estimation
Cao, Jinkun
Tang, Hongyang
Fang, Hao-Shu
Shen, Xiaoyong
Lu, Cewu
Tai, Yu-Wing
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9497 - 9506
[3] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
Cao, Zhe
Hidalgo, Gines
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) : 172 - 186
[4] Learning to Detect Human-Object Interactions
Chao, Yu-Wei
Liu, Yunfan
Liu, Xieyang
Zeng, Huayi
Deng, Jia
[J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 381 - 389
[5] HICO: A Benchmark for Recognizing Human-Object Interactions in Images
Chao, Yu-Wei
Wang, Zhan
He, Yugeng
Wang, Jiaxuan
Deng, Jia
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1017 - 1025
[6] Chen Gao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P696, DOI 10.1007/978-3-030-58610-2_41
[7] SuperPoint: Self-Supervised Interest Point Detection and Description
DeTone, Daniel
Malisiewicz, Tomasz
Rabinovich, Andrew
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 337 - 349
[8] Dong-Jin Kim, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12366), P718, DOI 10.1007/978-3-030-58589-1_43
[9] Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
Fan, Lifeng
Wang, Wenguan
Huang, Siyuan
Tang, Xinyu
Zhu, Song-Chun
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5723 - 5732
[10] Pairwise Body-Part Attention for Recognizing Human-Object Interactions
Fang, Hao-Shu
Cao, Jinkun
Tai, Yu-Wing
Lu, Cewu
[J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 52 - 68

← 1 2 3 4 →