DCaT: Lightweight Semantic Segmentation Model for High-Resolution Scenes

被引:0
作者
Huang, Kedi [1 ,2 ]
Huang, Heming [1 ,2 ]
Li, Wei [1 ,2 ]
Fan, Yonghong [1 ,2 ]
机构
[1] School of Computer Science and Technology, Qinghai Normal University, Xining
[2] State Key Laboratory of Tibetan Intelligent Information Processing and Application, Xining
关键词
high resolution; lightweight; semantic segmentation; sparse attention; Transformer;
D O I
10.3778/j.issn.1002-8331.2308-0315
中图分类号
学科分类号
摘要
Semantic segmentation is a critical task in computer vision for analyzing and understanding scenes. However, existing segmentation models require high computational costs and memory demands, which makes them unsuitable for lightweight semantic segmentation in high-resolution scenes. To address this issue, a novel lightweight semantic segmentation model called DCaT has been proposed, specifically designed for high-resolution scenes. First, the model extracts the local low-level semantics of the image using deep separable convolution; second, the global high-level semantics of the image is obtained using a lightweight Transformer based on coordinate-aware and dynamic sparse mixed attention; then, the high-level semantics are injected into low-level semantics through the fusion module; and lastly, pixel prediction labels are outputted through the segmentation head. The experimental results of DCaT on the high- resolution dataset Cityscapes show that compared to the benchmark model, the mean intersection over union has improved by 1.5 percentage points, the model's complexity has been reduced by 26%, and the inference speed has increased by 12%. A better balance between model complexity and performance in high-resolution scenarios is achieved, thus demonstrating the effectiveness and practicality of DCaT. © 2025 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:252 / 262
页数:10
相关论文
共 36 条
[1]  
HE J F, CHEN H W, LUO D H., Review of real-time semantic segmentation algorithms for deep learning, Computer Engineering and Applications, 59, 8, pp. 13-27, (2023)
[2]  
TIAN X W, WANG J L, CHEN M, Et al., Semantic segmentation of remote sensing images based on improved SegFormer network, Computer Engineering and Applications, 59, 8, pp. 217-226, (2023)
[3]  
ZHOU J H, PU Y W, CHEN R J, Et al., Improved UNet3 + network high- resolution remote sensing image road extraction, Laser Journal, 45, 2, pp. 161-168, (2024)
[4]  
XU G X, FENG C, MA F., Review of medical image segmentation based on UNet, Journal of Frontiers of Computer Science and Technology, 17, 8, pp. 1776-1792, (2023)
[5]  
LONG J, SHELHAMER E, DARRELL T., Fully convolutional networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, (2015)
[6]  
RONNEBERGER O, FISCHER P, BROX T., U-net: convolutional networks for biomedical image segmentation, Proceedings of Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, (2015)
[7]  
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An image is worth 16×16 words: Transformers for image recognition at scale[J], (2020)
[8]  
HOWARD A G, ZHU M, CHEN B, Et al., MobileNets: efficient convolutional neural networks for mobile vision applications [J], (2017)
[9]  
SANDLER M, HOWARD A, ZHU M, Et al., MobileNetv2: inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, (2018)
[10]  
ZHANG X, ZHOU X, LIN M, Et al., ShuffleNet: an extremely efficient convolutional neural network for mobile devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848-6856, (2018)