Hierarchical Reasoning Network for Human-Object Interaction Detection

被引:13
作者
Gao, Yiming [1 ]
Kuang, Zhanghui [2 ]
Li, Guanbin [1 ]
Zhang, Wayne [2 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] SenseTime Res, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Cognition; Correlation; Benchmark testing; Task analysis; Sports; Periodic structures; Human-object interaction; hierarchical reasoning network; graph neural network; REPRESENTATION; CNNS;
D O I
10.1109/TIP.2021.3093784
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-object interaction detection that aims at detecting <human, verb, object> triplets is critical for the holistic human-centric scene understanding. Existing approaches ignore the modeling of correlations among hierarchical human parts and objects. In this work, we introduce a Hierarchical Reasoning Network (HRNet) to capture relations among human parts at multiple scales (including the holistic human, human region, and human keypoint levels) and objects via a unified graph. In particular, HRNet first constructs one multi-level human parts graph, each level of which consists of human parts at one specific scale, objects, and the unions of human part-object pairs as nodes, and their mutual visual and spatial layout relations as intra-level reasoning. To also capture the relations across scales, we further introduce inter-level reasoning between the nodes of two consecutive levels based on the prior of human body structure. The representations of graph nodes are propagated along intra-level and inter-level reasoning in turn during reasoning. Extensive experiments demonstrate our HRNet obtains new state-of-the-art results on three challenging HICO-DET, V-COCO and HOI-A benchmarks, validating the compelling effectiveness of the proposed method.
引用
收藏
页码:8306 / 8317
页数:12
相关论文
共 72 条
[1]   Object Level Visual Reasoning in Videos [J].
Baradel, Fabien ;
Neverova, Natalia ;
Wolf, Christian ;
Mille, Julien ;
Mori, Greg .
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 :106-122
[2]  
Heilbron FC, 2015, PROC CVPR IEEE, P961, DOI 10.1109/CVPR.2015.7298698
[3]   Cascade R-CNN: Delving into High Quality Object Detection [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162
[4]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[5]   Learning to Detect Human-Object Interactions [J].
Chao, Yu-Wei ;
Liu, Yunfan ;
Liu, Xieyang ;
Zeng, Huayi ;
Deng, Jia .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389
[6]  
Chen Gao, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P696, DOI 10.1007/978-3-030-58610-2_41
[7]   Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding [J].
Chen, Tianshui ;
Wu, Wenxi ;
Gao, Yuefang ;
Dong, Le ;
Luo, Xiaonan ;
Lin, Liang .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :2023-2031
[8]  
Chen TS, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P627
[9]   Iterative Visual Reasoning Beyond Convolutions [J].
Chen, Xinlei ;
Li, Li-Jia ;
Li Fei-Fei ;
Gupta, Abhinav .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :7239-7248
[10]  
Dong-Jin Kim, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12366), P718, DOI 10.1007/978-3-030-58589-1_43