Towards Scene Understanding with Detailed 3D Object Representations

被引：49

作者：

Zia, M. Zeeshan ^{[1
,2
]}

Stark, Michael ^{[3
]}

Schindler, Konrad ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] Univ London Imperial Coll Sci Technol & Med, London, England

[3] Max Planck Inst Informat, D-66123 Saarbrucken, Germany

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2015年 / 112卷 / 02期

关键词：

TRACKING; MODELS;

D O I：

10.1007/s11263-014-0780-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current approaches to semantic image and scene understanding typically employ rather simple object representations such as 2D or 3D bounding boxes. While such coarse models are robust and allow for reliable object detection, they discard much of the information about objects' 3D shape and pose, and thus do not lend themselves well to higher-level reasoning. Here, we propose to base scene understanding on a high-resolution object representation. An object class-in our case cars-is modeled as a deformable 3D wireframe, which enables fine-grained modeling at the level of individual vertices and faces. We augment that model to explicitly include vertex-level occlusion, and embed all instances in a common coordinate frame, in order to infer and exploit object-object interactions. Specifically, from a single view we jointly estimate the shapes and poses of multiple objects in a common 3D frame. A ground plane in that frame is estimated by consensus among different objects, which significantly stabilizes monocular 3D pose estimation. The fine-grained model, in conjunction with the explicit 3D scene model, further allows one to infer part-level occlusions between the modeled objects, as well as occlusions by other, unmodeled scene elements. To demonstrate the benefits of such detailed object class models in the context of scene understanding we systematically evaluate our approach on the challenging KITTI street scene dataset. The experiments show that the model's ability to utilize image evidence at the level of individual parts improves monocular 3D pose estimation w.r.t. both location and (continuous) viewpoint.

引用

页码：188 / 203

页数：16

共 50 条

[1] Detailed 3D Representations for Object Recognition and Modeling
Zia, M. Zeeshan
Stark, Michael
Schiele, Bernt
Schindler, Konrad
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2608 - 2623
[2] Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes
Wojek, Christian
Roth, Stefan
Schindler, Konrad
Schiele, Bernt
COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 : 467 - 481
[3] 3D Traffic Scene Understanding from Movable Platforms
Geiger, Andreas
Lauer, Martin
Wojek, Christian
Stiller, Christoph
Urtasun, Raquel
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (05) : 1012 - 1025
[4] Integrating 3D structure into traffic scene understanding with RGB-D data
Xia, Yingjie
Xu, Weiwei
Zhang, Luming
Shi, Xingmin
Mao, Kuang
NEUROCOMPUTING, 2015, 151 : 700 - 709
[5] Temporal Point Cloud Fusion With Scene Flow for Robust 3D Object Tracking
Yang, Yanding
Jiang, Kun
Yang, Diange
Jiang, Yanqin
Lu, Xiaowei
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1579 - 1583
[6] 3D sketching for 3D object retrieval
Li, Bo
Yuan, Juefei
Ye, Yuxiang
Lu, Yijuan
Zhang, Chaoyang
Tian, Qi
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9569 - 9595
[7] Resolving 3D Human Pose Ambiguities with 3D Scene Constraints
Hassan, Mohamed
Choutas, Vasileios
Tzionas, Dimitrios
Black, Michael J.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2282 - 2292
[8] Deep Learning Approach to Point Cloud Scene Understanding for Automated Scan to 3D Reconstruction
Chen, Jingdao
Kira, Zsolt
Cho, Yong K.
JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2019, 33 (04)
[9] MULTI-VIEW OBJECT AND HUMAN BODY PART DETECTION UTILIZING 3D SCENE INFORMATION
Sfiris, Georgios
Nikolaidis, Nikolaos
Pitas, Ioannis
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 29 - 32
[10] Semantic Scene Builder: Towards a Context Sensitive Text-to-3D Scene Framework
Henlein, Alexander
Kett, Attila
Baumartz, Daniel
Abrami, Giuseppe
Mehler, Alexander
Bastian, Johannes
Blecher, Yannic
Budgenhagen, David
Christof, Roman
Ewald, Tim-Oliver
Fauerbach, Tim
Masny, Patrick
Mende, Julian
Schnuere, Paul
Viel, Marc
DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT, DHM 2023, PT II, 2023, 14029 : 461 - 479

← 1 2 3 4 5 →