Rethinking DABNet: Light-Weight Network for Real-Time Semantic Segmentation of Road Scenes

被引：2

作者：

Mazhar S. ^{[1
]}

Atif N. ^{[1
]}

Bhuyan M.K. ^{[1
]}

Ahamed S.R. ^{[1
]}

机构：

[1] Indian Institute of Technology, Department of Electronics and Electrical Engineering, Assam, Guwahati

来源：

IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 06期

关键词：

Autonomous driving; deep convolutional neural networks; real-time; road scenes; semantic segmentation;

D O I：

10.1109/TAI.2023.3341976

中图分类号：

学科分类号：

摘要：

Recent advancements in autonomous driving and mobile devices have led to the development of real-time and lightweight semantic image segmentation models. However, these algorithms readily suffer from inherent accuracy loss compared to large networks. DABNet (Li et al., 2019) presented a highly efficient method to balance the accuracy-model size tradeoff. Nevertheless, the bottleneck structure and single-scale receptive field of its building block have limited performance for the given network size. To further improve the segmentation score and reduce the number of parameters, the basic block is redesigned using an inverted-residual and dilation pyramid structure (IRDP). The IRDP module can efficiently learn contextual features at multiple dilations within the block. Using the inverted-residual structure with an expansion layer prevents information loss due to the dimensionality reduction of the feature space. The IRDP block is utilized to rebuild the DABNet structure, working in real-time for resource-constrained devices. In addition, a fast and lightweight decoder-fast-lightweight decoder (FLD) is also proposed to improve the segmentation accuracy of the network. Experiments performed on Cityscapes and Cambridge-driving Labeled Video Database (CamVid) datasets demonstrate the effectiveness of the proposed approach. On Cityscapes, IRDPNet can achieve a mean Intersection-over-Union (mIOU) of 75.62%. At the same time, the lighter version gets an mIoU of 71.32% with only 0.32 million parameters, which is similar to the DABNet accuracy with half the number of parameters. © 2020 IEEE.

引用

页码：3098 / 3108

页数：10

共 43 条

[1]

Li G., Yun I.Y., Kim J., Kim J., DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation, Proc. Brit. Mach. Vision Conf. (BMVC), pp. 1-12, (2019)

[2]

Strudel R., Garcia R., Laptev I., Schmid C., Segmenter: Transformer for semantic segmentation, Proc. IEEE/CVF Int. Conf. Comput. Vision (ICCV), pp. 7242-7252, (2021)

[3]

Paszke A., Chaurasia A., Kim S., Culurciello E., ENet: A deep neural network architecture for real-time semantic segmentation, Proc. 4th Int. Conf. Learn. Representations (ICLR), (2016)

[4]

He K., Zhang X., Ren S., Sun J., Deep residual learning for image recognition, Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), pp. 770-778, (2016)

[5]

Romera E., Alvarez J.M., Bergasa L.M., Arroyo R., ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation, IEEE Trans. Intell. Transp. Syst., 19, 1, pp. 263-272, (2018)

[6]

Howard A.G., Et al., MobileNets: Efficient convolutional neural networks for mobile vision applications, (2017)

[7]

Sandler M., Howard A., Zhu M., Zhmoginov A., Chen L.-C., MobileNetV2: Inverted residuals and linear bottlenecks, (2018)

[8]

Shelhamer E., Long J., Darrell T., Fully convolutional networks for semantic segmentation, Proc. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI), USA, 39, pp. 640-651, (2017)

[9]

Zhao H., Shi J., Qi X., Wang X., Jia J., Pyramid scene parsing network, Proc. IEEE Conf. Comput. Vision Pattern Recognit. (CVPR), pp. 6230-6239, (2017)

[10]

Yang M., Yu K., Zhang C., Li Z., Yang K., DenseASPP for semantic segmentation in street scenes, Proc. IEEE/CVF Conf. Comput. Vision Pattern Recognit., pp. 3684-3692, (2018)

← 1 2 3 4 5 →