Multi-stream Network for Human-object Interaction Detection

被引:4
作者
Wang, Chang [1 ]
Sun, Jinyu [1 ]
Ma, Shiwei [1 ]
Lu, Yuqiu [1 ]
Liu, Wang [1 ]
机构
[1] Shanghai Univ, Sch Mechatron Engn & Automat, Shanghai 200444, Peoples R China
关键词
Four-stream network; human-object interaction; visual features; spatial features; human pose; intersection;
D O I
10.1142/S0218001421500257
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
引用
收藏
页数:16
相关论文
共 34 条
[1]   Learning to Detect Human-Object Interactions [J].
Chao, Yu-Wei ;
Liu, Yunfan ;
Liu, Xieyang ;
Zeng, Huayi ;
Deng, Jia .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :381-389
[2]   HICO: A Benchmark for Recognizing Human-Object Interactions in Images [J].
Chao, Yu-Wei ;
Wang, Zhan ;
He, Yugeng ;
Wang, Jiaxuan ;
Deng, Jia .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1017-1025
[3]   P-CNN: Pose-based CNN Features for Action Recognition [J].
Cheron, Guilhem ;
Laptev, Ivan ;
Schmid, Cordelia .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :3218-3226
[4]   Sparse Coding Guided Spatiotemporal Feature Learning for Abnormal Event Detection in Large Videos [J].
Chu, Wenqing ;
Xue, Hongyang ;
Yao, Chengwei ;
Cai, Deng .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (01) :246-255
[5]   RMPE: Regional Multi-Person Pose Estimation [J].
Fang, Hao-Shu ;
Xie, Shuqin ;
Tai, Yu-Wing ;
Lu, Cewu .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2353-2362
[6]  
Gao C., 2019, BRIT MACH VIS C
[7]  
Girdhar R, 2017, ADV NEUR IN, V30
[8]  
Girshick R., 2018, Detectron
[9]   Detecting and Recognizing Human-Object Interactions [J].
Gkioxari, Georgia ;
Girshick, Ross ;
Dollar, Piotr ;
He, Kaiming .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8359-8367
[10]   Contextual Action Recognition with R*CNN [J].
Gkioxari, Georgia ;
Girshick, Ross ;
Malik, Jitendra .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1080-1088