SSNet: a joint learning network for semantic segmentation and disparity estimation

被引:1
作者
Jia, Dayu [1 ,2 ,3 ]
Pang, Yanwei [1 ,5 ]
Cao, Jiale [1 ,5 ]
Jing, Pan [4 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Peoples R China
[3] Chinese Acad Sci, Inst Robot & Intelligent Mfg, Shenyang 110169, Peoples R China
[4] Tianjin Univ Technol & Educ, Sch Elect Engn, Tianjin 300222, Peoples R China
[5] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
关键词
Joint learning; Semantic segmentation; Stereo disparity estimation; Transformer;
D O I
10.1007/s00371-024-03336-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Joint learning for semantic segmentation and disparity estimation is adopted to scene parsing for mutual benefit. However, existing joint learning approaches unify the two task briefly which may result in negative feature mixing. In order to solve the problem, a win-win approach Stereo Semantic Network (SSNet) is proposed for pixel-wise scene parsing. SSNet is the first Transformer based end-to-end joint learning model for semantic segmentation and disparity estimation. The main novelty lies in the proposed Transformer Feature Separation Module (TFSM) which is designed to separate features for segmentation prediction and disparity regression according to the characteristics of the two tasks. The segmentation and disparity results are supervised jointly with a weighted summation loss function to improve the performance of both tasks. Experimental results on Cityscapes Dataset and KITTI 2015 Dataset demonstrate that SSNet outperforms state-of-the-art joint learning approaches.
引用
收藏
页码:423 / 435
页数:13
相关论文
共 67 条
[1]  
[Anonymous], 2015, SEGNET DEEP CONVOLUT
[2]  
Ba J.L., 2016, CORR
[3]  
Bevandic P., 2021, ARXIV
[4]   Triply Supervised Decoder Networks for Joint Detection and Segmentation [J].
Cao, Jiale ;
Pang, Yanwei ;
Li, Xuelong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7384-7393
[5]   Pyramid Stereo Matching Network [J].
Chang, Jia-Ren ;
Chen, Yong-Sheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418
[6]  
Chen L.C., 2014, ARXIV, DOI DOI 10.48550/ARXIV.1412.7062
[7]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[8]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[9]  
Chen LJ, 2018, ADV NEUR IN, V31
[10]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223