Modeling 4D Human-Object Interactions for Event and Object Recognition

被引：68

作者：

Wei, Ping ^{[1
,2
]}

Zhao, Yibiao ^{[2
]}

Zheng, Nanning ^{[1
]}

Zhu, Song-Chun ^{[2
]}

机构：

[1] Xi An Jiao Tong Univ, Xian, Peoples R China

[2] Univ Calif Los Angeles, Los Angeles, CA USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) | 2013年

关键词：

D O I：

10.1109/ICCV.2013.406

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recognizing the events and objects in the video sequence are two challenging tasks due to the complex temporal structures and the large appearance variations. In this paper, we propose a 4D human-object interaction model, where the two tasks jointly boost each other. Our human-object interaction is defined in 4D space: i) the co-occurrence and geometric constraints of human pose and object in 3D space; ii) the sub-events transition and objects coherence in 1D temporal dimension. We represent the structure of events, sub-events and objects in a hierarchical graph. For an input RGB-depth video, we design a dynamic programming beam search algorithm to: i) segment the video, ii) recognize the events, and iii) detect the objects simultaneously. For evaluation, we built a large-scale multiview 3D event dataset which contains 3815 video sequences and 383,036 RGBD frames captured by the Kinect cameras. The experiment results on this dataset show the effectiveness of our method.

引用

页码：3272 / 3279

页数：8