Towards Scene Understanding with Detailed 3D Object Representations

被引:49
|
作者
Zia, M. Zeeshan [1 ,2 ]
Stark, Michael [3 ]
Schindler, Konrad [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] Univ London Imperial Coll Sci Technol & Med, London, England
[3] Max Planck Inst Informat, D-66123 Saarbrucken, Germany
关键词
TRACKING; MODELS;
D O I
10.1007/s11263-014-0780-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current approaches to semantic image and scene understanding typically employ rather simple object representations such as 2D or 3D bounding boxes. While such coarse models are robust and allow for reliable object detection, they discard much of the information about objects' 3D shape and pose, and thus do not lend themselves well to higher-level reasoning. Here, we propose to base scene understanding on a high-resolution object representation. An object class-in our case cars-is modeled as a deformable 3D wireframe, which enables fine-grained modeling at the level of individual vertices and faces. We augment that model to explicitly include vertex-level occlusion, and embed all instances in a common coordinate frame, in order to infer and exploit object-object interactions. Specifically, from a single view we jointly estimate the shapes and poses of multiple objects in a common 3D frame. A ground plane in that frame is estimated by consensus among different objects, which significantly stabilizes monocular 3D pose estimation. The fine-grained model, in conjunction with the explicit 3D scene model, further allows one to infer part-level occlusions between the modeled objects, as well as occlusions by other, unmodeled scene elements. To demonstrate the benefits of such detailed object class models in the context of scene understanding we systematically evaluate our approach on the challenging KITTI street scene dataset. The experiments show that the model's ability to utilize image evidence at the level of individual parts improves monocular 3D pose estimation w.r.t. both location and (continuous) viewpoint.
引用
收藏
页码:188 / 203
页数:16
相关论文
共 50 条
  • [1] Detailed 3D Representations for Object Recognition and Modeling
    Zia, M. Zeeshan
    Stark, Michael
    Schiele, Bernt
    Schindler, Konrad
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2608 - 2623
  • [2] Monocular 3D Scene Modeling and Inference: Understanding Multi-Object Traffic Scenes
    Wojek, Christian
    Roth, Stefan
    Schindler, Konrad
    Schiele, Bernt
    COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 : 467 - 481
  • [3] 3D Traffic Scene Understanding from Movable Platforms
    Geiger, Andreas
    Lauer, Martin
    Wojek, Christian
    Stiller, Christoph
    Urtasun, Raquel
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (05) : 1012 - 1025
  • [4] Integrating 3D structure into traffic scene understanding with RGB-D data
    Xia, Yingjie
    Xu, Weiwei
    Zhang, Luming
    Shi, Xingmin
    Mao, Kuang
    NEUROCOMPUTING, 2015, 151 : 700 - 709
  • [5] Temporal Point Cloud Fusion With Scene Flow for Robust 3D Object Tracking
    Yang, Yanding
    Jiang, Kun
    Yang, Diange
    Jiang, Yanqin
    Lu, Xiaowei
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1579 - 1583
  • [6] 3D sketching for 3D object retrieval
    Li, Bo
    Yuan, Juefei
    Ye, Yuxiang
    Lu, Yijuan
    Zhang, Chaoyang
    Tian, Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (06) : 9569 - 9595
  • [7] Resolving 3D Human Pose Ambiguities with 3D Scene Constraints
    Hassan, Mohamed
    Choutas, Vasileios
    Tzionas, Dimitrios
    Black, Michael J.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2282 - 2292
  • [8] Deep Learning Approach to Point Cloud Scene Understanding for Automated Scan to 3D Reconstruction
    Chen, Jingdao
    Kira, Zsolt
    Cho, Yong K.
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2019, 33 (04)
  • [9] MULTI-VIEW OBJECT AND HUMAN BODY PART DETECTION UTILIZING 3D SCENE INFORMATION
    Sfiris, Georgios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 29 - 32
  • [10] Semantic Scene Builder: Towards a Context Sensitive Text-to-3D Scene Framework
    Henlein, Alexander
    Kett, Attila
    Baumartz, Daniel
    Abrami, Giuseppe
    Mehler, Alexander
    Bastian, Johannes
    Blecher, Yannic
    Budgenhagen, David
    Christof, Roman
    Ewald, Tim-Oliver
    Fauerbach, Tim
    Masny, Patrick
    Mende, Julian
    Schnuere, Paul
    Viel, Marc
    DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT, DHM 2023, PT II, 2023, 14029 : 461 - 479