Human-object interaction detection with depth-augmented clues

被引:4
作者
Cheng, Yamin [1 ]
Duan, Hancong [1 ]
Wang, Chen [1 ]
Wang, Zhi [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Human -object interaction; Depth map; NETWORK; ATTENTION; GENERATION;
D O I
10.1016/j.neucom.2022.05.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human object interaction (HOI) detection aims to localize and classify triplets of human, object and relationship from a given image. Different from previous methods that only extract vision information in RGB images, we propose a Depth-augmented Relationship Reasoning (DRR) method that focuses on the RGB images and corresponding depth messages simultaneously. Rethinking principles of photography, we argue that RGB images discard spatial depth carrying third dimension relative distance information between instances. In light of this, we beforehand estimate the depth information for each image, yielding a corresponding depth map. Then we leverage multiple representations encoded by depth information and RGB images to enrich semantic interpretation. Subsequently, we explore a hierarchical attention strategy to fuse these semantic representations and further generate depth-augmented features, being used to reason about fine-grained human-object interactions. Extensive experiments on the benchmark datasets V-COCO, HICO-DET and HCVRD verify the effectiveness of our method and demonstrate the importance of spatial depth information for HOI.
引用
收藏
页码:978 / 988
页数:11
相关论文
共 66 条
[1]  
[Anonymous], 2017, C COMP VIS PATT REC
[2]   Learning to Detect Human-Object Interactions [J].
Chao, Yu-Wei ;
Liu, Yunfan ;
Liu, Xieyang ;
Zeng, Huayi ;
Deng, Jia .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389
[3]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[4]   Wavelet-Based EEG Processing for Epilepsy Detection Using Fuzzy Entropy and Associative Petri Net [J].
Chiang, Hsiu-Sen ;
Chen, Mu-Yen ;
Huang, Yu-Jhih .
IEEE ACCESS, 2019, 7 :103255-103262
[5]   Sensor-based and vision-based human activity recognition: A comprehensive survey [J].
Dang, L. Minh ;
Min, Kyungbok ;
Wang, Hanxiang ;
Piran, Md. Jalil ;
Lee, Cheol Hee ;
Moon, Hyeonjoon .
PATTERN RECOGNITION, 2020, 108
[6]   Convergent newton method and neural network for the electric energy usage prediction [J].
de Jesus Rubio, Jose ;
Antonio Islas, Marco ;
Ochoa, Genaro ;
Ricardo Cruz, David ;
Garcia, Enrique ;
Pacheco, Jaime .
INFORMATION SCIENCES, 2022, 585 :89-112
[7]   Adapting H-infinity controller for the desired reference tracking of the sphere position in the maglev process [J].
de Jesus Rubio, Jose ;
Lughofer, Edwin ;
Pieper, Jeff ;
Cruz, Panuncio ;
Ivan Martinez, Dany ;
Ochoa, Genaro ;
Antonio Islas, Marco ;
Garcia, Enrique .
INFORMATION SCIENCES, 2021, 569 :669-686
[8]   Joint usage of global and local attentions in hourglass network for human pose estimation [J].
Dong, Xiena ;
Yu, Jun ;
Zhang, Jian .
NEUROCOMPUTING, 2022, 472 :95-102
[9]  
Eigen D, ARXIV PREPRINT ARXIV
[10]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011