Omnisupervised Omnidirectional Semantic Segmentation

被引:35
作者
Yang, Kailun [1 ]
Hu, Xinxin [2 ]
Fang, Yicheng [2 ]
Wang, Kaiwei [2 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Karlsruhe Inst Technol, Inst Anthropomat & Robot, D-76131 Karlsruhe, Germany
[2] Zhejiang Univ, State Key Lab Modern Opt Instrumentat, Hangzhou 310027, Peoples R China
关键词
Semantics; Image segmentation; Training; Data models; Sensors; Task analysis; Cameras; Intelligent vehicles; scene understanding; semantic segmentation; scene parsing; omnisupervised learning; omnidirectional images; VIDEO;
D O I
10.1109/TITS.2020.3023331
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Modern efficient Convolutional Neural Networks (CNNs) are able to perform semantic segmentation both swiftly and accurately, which covers typically separate detection tasks desired by Intelligent Vehicles (IV) in a unified way. Most of the current semantic perception frameworks are designed to work with pinhole cameras and benchmarked against public datasets with narrow Field-of-View (FoV) images. However, there is a large accuracy downgrade when a pinhole-yielded CNN is taken to omnidirectional imagery, causing it unreliable for surrounding perception. In this paper, we propose an omnisupervised learning framework for efficient CNNs, which bridges multiple heterogeneous data sources that are already available in the community, bypassing the labor-intensive process to have manually annotated panoramas, while improving their reliability in unseen omnidirectional domains. Being omnisupervised, the efficient CNN exploits both labeled pinhole images and unlabeled panoramas. The framework is based on our specialized ensemble method that considers the wide-angle and wrap-around features of omnidirectional images, to automatically generate panoramic labels for data distillation. A comprehensive variety of experiments demonstrates that the proposed solution helps to attain significant generalizability gains in panoramic imagery domains. Our approach outperforms state-of-the-art efficient segmenters on highly unconstrained IDD20K and PASS datasets.
引用
收藏
页码:1184 / 1199
页数:16
相关论文
共 71 条
[1]  
[Anonymous], INT C LEARN REPR
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]   Semantic object classes in video: A high-definition ground truth database [J].
Brostow, Gabriel J. ;
Fauqueur, Julien ;
Cipolla, Roberto .
PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97
[4]  
Budvytis I., 2019, BMVC
[5]  
Chaurasia A, 2017, 2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[8]  
Culurciello E, 2016, ARXIV160602147
[9]   Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras [J].
Deng, Liuyuan ;
Yang, Ming ;
Li, Hao ;
Li, Tianyi ;
Hu, Bing ;
Wang, Chunxiang .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (10) :4350-4362
[10]  
Deng LY, 2017, IEEE INT VEH SYM, P231, DOI 10.1109/IVS.2017.7995725