Automatically detecting human-object interaction by an instance part-level attention deep framework

被引:5
作者
Bai, Lin [1 ]
Chen, Fenglian [1 ]
Tian, Yang [1 ]
机构
[1] Guangxi Univ, Sch Comp Elect & Informat, Nanning 530004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Human -object interaction; Instance part -level correlations; Self -attention -based model; Image context;
D O I
10.1016/j.patcog.2022.109110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatically detecting human-object interactions (HOIs) from an image is a very important but chal-lenging task in computer vision. One of the significant problems in HOI detection is that similar human -object interactions are difficult to distinguish. Recently, many instance-centric HOI detection schemes, based on appearance features and coarse spatial information, have been proposed. These methods, how-ever, lack the capacity of capturing and analyzing the fine-grained context between human poses and object parts, which plays a crucial role in HOI detection. To address these problems, we propose a novel instance part-level attention deep framework for HOI detection. Specifically, our approach consists of a human/object-part detection phase and an HOI detection phase. In the former phase, a part-level vi-sual pattern estimation model is designed for capturing the fine-grained human body parts and object parts. In the latter phase, a self-attention-based deep network is proposed to learn the visual compos-ite around the human-object pair that implicitly expresses the consistent spatial, scale, co-occurrence, and viewpoint relationships among human body parts and object parts across images, which are effec-tive for predicting HOI. To the best of our knowledge, we are the first to propose a framework where the fine-grained part-level mutual context of a human-object pair is extracted to improve HOI detec-tion. By comparing our approach with state-of-the-art HOI detection methods on benchmark datasets, we demonstrated that our proposed framework outperformed the existing HOI detection methods, such as significantly improving the performance of part-level visual pattern estimation, HOI detection, and the quality of the self-attention-based deep network structure.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 39 条
  • [1] Visual phrase recognition by modeling 3D spatial context of multiple objects
    Bai, Lin
    Chen, Qingfeng
    [J]. NEUROCOMPUTING, 2017, 253 : 183 - 192
  • [2] Cross-Domain Adaptation for Animal Pose Estimation
    Cao, Jinkun
    Tang, Hongyang
    Fang, Hao-Shu
    Shen, Xiaoyong
    Lu, Cewu
    Tai, Yu-Wing
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9497 - 9506
  • [3] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
    Cao, Zhe
    Hidalgo, Gines
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) : 172 - 186
  • [4] Learning to Detect Human-Object Interactions
    Chao, Yu-Wei
    Liu, Yunfan
    Liu, Xieyang
    Zeng, Huayi
    Deng, Jia
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 381 - 389
  • [5] HICO: A Benchmark for Recognizing Human-Object Interactions in Images
    Chao, Yu-Wei
    Wang, Zhan
    He, Yugeng
    Wang, Jiaxuan
    Deng, Jia
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1017 - 1025
  • [6] Chen Gao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P696, DOI 10.1007/978-3-030-58610-2_41
  • [7] SuperPoint: Self-Supervised Interest Point Detection and Description
    DeTone, Daniel
    Malisiewicz, Tomasz
    Rabinovich, Andrew
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 337 - 349
  • [8] Dong-Jin Kim, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12366), P718, DOI 10.1007/978-3-030-58589-1_43
  • [9] Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning
    Fan, Lifeng
    Wang, Wenguan
    Huang, Siyuan
    Tang, Xinyu
    Zhu, Song-Chun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5723 - 5732
  • [10] Pairwise Body-Part Attention for Recognizing Human-Object Interactions
    Fang, Hao-Shu
    Cao, Jinkun
    Tai, Yu-Wing
    Lu, Cewu
    [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 52 - 68