A serial semantic segmentation model based on encoder-decoder architecture

被引:2
作者
Zhou, Yan [1 ,2 ]
机构
[1] Zhejiang Univ, Ocean Coll, Zhoushan 316021, Peoples R China
[2] Minist Nat Resources, Inst Oceanog 2, State Key Lab Satellite Ocean Environm Dynam, Hangzhou 310012, Peoples R China
关键词
Semantic segmentation; Lawin transformer; CNN; Encoder-decoder; Attention mechanism; DEEP CONVOLUTIONAL NETWORKS; FUSION NETWORK;
D O I
10.1016/j.knosys.2024.111819
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The thriving progress of Convolutional Neural Networks (CNNs) and the outstanding efficacy of Visual Transformers (ViTs) have delivered impressive outcomes in the domain of semantic segmentation. However, each model in isolation entails a trade-off between high computational complexity and compromised computational efficiency. To address this challenge, we effectively combine the CNN and encoder-decoder structures in a Transformer-inspired fashion, presenting the Serial Semantic Segmentation Trans via CNN Former (SSS-Former) model. To augment the feature extraction capability, we utilize the meticulously crafted SSS-CSPNet, resulting in a well-designed architecture for the holistic model. We propose a novel SSS-PN attention network that enhances the spatial topological connections of features, leading to improved overall performance. Additionally, the integration of SASPP bridges the semantic gap between multi-scale features and enhances segmentation ability for overlapping objects. To fulfill the requirement of real-time segmentation, we leverage a novel restructuring technique to devise a more lightweight and faster ResSSS-Former model. Abundant experimental results demonstrate that both SSS-Former and ResSSS-Former outperform existing state-of-the-art methods in terms of computational efficiency, result precision, and speed. Remarkably, SSS-Former achieves a mIoU of 58.63 % at 89.1FPS on the ADE20K dataset. On the validation and testing datasets of CityScapes, it obtains mIoU scores of 85.1 % and 85.2 % respectively, with a speed of 94.1FPS. Our optimized ResSSS-Former achieves impressive realtime segmentation results, with an astonishing 100+FPS while maintaining high segmentation accuracy. The compelling results from the ISPRS datasets further validate the effectiveness of our proposed models in segmenting multi-scale and overlapping objects.
引用
收藏
页数:18
相关论文
共 80 条
[1]  
Bhargavi K., 2014, Int. J. Innov. Res. Dev., V3, P234, DOI DOI 10.1049/IET-IPR.2018.6150
[2]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[3]   HENet: Head-Level Ensemble Network for Very High Resolution Remote Sensing Images Semantic Segmentation [J].
Cao, Yong ;
Huo, Chunlei ;
Xu, Nuo ;
Zhang, Xin ;
Xiang, Shiming ;
Pan, Chunhong .
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[4]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980
[5]  
Carion N., 2020, LNCS, V12346, P213, DOI [DOI 10.1007/978-3-030-58452-813, 10.1007/978- 3- 030-58452-8 13, DOI 10.1007/978-3-030-58452-8_13]
[6]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[7]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[8]   Hierarchical Aggregation for 3D Instance Segmentation [J].
Chen, Shaoyu ;
Fang, Jiemin ;
Zhang, Qian ;
Liu, Wenyu ;
Wang, Xinggang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :15447-15456
[9]  
Cheng B, 2021, ADV NEUR IN, V34
[10]   Masked-attention Mask Transformer for Universal Image Segmentation [J].
Cheng, Bowen ;
Misra, Ishan ;
Schwing, Alexander G. ;
Kirillov, Alexander ;
Girdhar, Rohit .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1280-1289