Learning Spatial Common Sense with Geometry-Aware Recurrent Networks

被引:39
作者
Tung, Hsiao-Yu Fish [1 ]
Cheng, Ricson [2 ]
Fragkiadaki, Katerina [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Uber Adv Technol Grp, Pittsburgh, PA USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00270
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We integrate two powerful ideas, geometry and deep visual representation learning, into recurrent network architectures for mobile visual scene understanding. The proposed networks learn to "lift" 2D visual features and integrate them over time into latent 3D feature maps of the scene. They are equipped with differentiable geometric operations, such as projection, unprojection, egomotion estimation and stabilization, in order to compute a geometrically-consistent mapping between the world scene and their 3D latent feature space. We train the proposed architectures to predict novel image views given short frame sequences as input. Their predictions strongly generalize to scenes with a novel number of objects, appearances and configurations, and greatly outperform predictions of previous works that do not consider egomotion stabilization or a space-aware latent feature space. We train the proposed architectures to detect and segment objects in 3D, using the latent 3D feature map as input-as opposed to 2D feature maps computed from video frames. The resulting detections are permanent: they continue to exist even when an object gets occluded or leaves the field of view. Our experiments suggest the proposed space-aware latent feature arrangement and egomotion-stabilized convolutions are essential architectural choices for spatial common sense to emerge in artificial embodied visual agents.
引用
收藏
页码:2590 / 2598
页数:9
相关论文
共 33 条
[1]  
[Anonymous], P IEEE C COMP VIS PA
[2]  
[Anonymous], ABS171203316 CORR
[3]  
[Anonymous], 2015, SINGLE VIEW MULTIVIE
[4]  
[Anonymous], 2015, SHAPENET INFORM RICH
[5]  
[Anonymous], 2017, ABS170406254 CORR
[6]  
[Anonymous], ABS150501596 CORR
[7]  
[Anonymous], 2014, ABS14061078 CORR
[8]  
[Anonymous], 2012, ASL LECT NOTES LOGIC
[9]  
[Anonymous], ROBOTICS SCI SYSTEMS
[10]  
[Anonymous], ARXIV180808378