Lifting the Veil of Frequency in Joint Segmentation and Depth Estimation

被引:2
作者
Fu, Tianhao [1 ]
Li, Yingying [1 ]
Ye, Xiaoqing [1 ]
Tan, Xiao [1 ]
Sun, Hao [1 ]
Shen, Fumin [2 ]
Ding, Errui [1 ]
机构
[1] Baidu, Beijing, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
关键词
Multi-task learning; Semantic segmentation; Depth estimation;
D O I
10.1145/3474085.3475277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Joint learning of scene parsing and depth estimation remains a challenging task due to the rivalry between the two tasks. In this paper, we revisit the mutual enhancement for joint semantic segmentation and depth estimation. Inspired by the observation that the competition and cooperation could be reflected in the feature frequency components of different tasks, we propose a Frequency Aware Feature Enhancement (FAFE) network that can effectively enhance the reciprocal relationship whereas avoiding the competition. In FAFE, a frequency disentanglement module is proposed to fetch the favorable frequency component sets for each task and resolve the discordance between the two tasks. For task cooperation, we introduce a re-calibration unit to aggregate features of the two tasks, so as to complement task information with each other. Accordingly, the learning of each task can be boosted by the complementary task appropriately. Besides, a novel local-aware consistency loss function is proposed to impose on the predicted segmentation and depth so as to strengthen the cooperation. With the FAFE network and new local-aware consistency loss encapsulated into the multitask learning network, the proposed approach achieves superior performance over previous state-of-the-art methods. Extensive experiments and ablation studies on multi-task datasets demonstrate the effectiveness of our proposed approach.
引用
收藏
页码:944 / 952
页数:9
相关论文
共 47 条
[1]  
Alhashim Ibraheem, 2018, High quality monocular depth estimation via transfer learning
[2]  
[Anonymous], Vision and Pattern Recognition (CVPR)
[3]  
[Anonymous], 2021, P IEEE CVF C COMP VI, DOI DOI 10.1109/TSMC.2019.2958072
[4]  
Bhat Shariq Farooq, 2020, ARXIV201114141
[5]  
Chen L. C., 2014, ICLR
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[8]   Multi-task Self-Supervised Visual Learning [J].
Doersch, Carl ;
Zisserman, Andrew .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2070-2079
[9]   BlitzNet: A Real-Time Deep Network for Scene Understanding [J].
Dvornik, Nikita ;
Shmelkov, Konstantin ;
Mairal, Julien ;
Schmid, Cordelia .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4174-4182
[10]  
Eigen D, 2014, ADV NEUR IN, V27