RAFNet: Reparameterizable Across-Resolution Fusion Network for Real-Time Image Semantic Segmentation

被引：8

作者：

Chen, Lei ^{[1
]}

Dai, Huhe ^{[1
]}

Zheng, Yuan ^{[2
]}

机构：

[1] Inner Mongolia Univ, Coll Elect Informat Engn, Hohhot 010021, Peoples R China

[2] Inner Mongolia Univ, Coll Comp Sci, Natl & Local Joint Engn Res Ctr Intelligent Inform, Hohhot 010021, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Real-time image segmentation; encoder-decoder structure; lightweight network; hardware deployment; AGGREGATION;

D O I：

10.1109/TCSVT.2023.3293166

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The demand to implement semantic segmentation networks on mobile devices has increased dramatically. However, existing real-time semantic segmentation methods still suffer from a large number of network parameters, unsuitable for mobile devices with limited memory resources. The reason mainly arises from the fact that most existing methods take the backbone networks (e.g., ResNet-18 and MobileNet) as an encoder. To alleviate this problem, we propose a novel Reparameterizable Channel & Dilation (RCD) block and construct a considerably lightweight yet effective encoder by stacking several RCD blocks according to three guidelines. The strengths of the proposed encoder result in the abilities not only to extract discriminative feature representations via channel convolutions and dilated convolutions, but also to reduce computational burdens while maintaining segmentation accuracy with the help of re-parameterization technique. Except for encoder, we also present a simple but effective decoder that adopts an across-resolution fusion strategy to fuse multi-scale feature maps generated from the encoder instead of a bottom-up pathway fusion. With such an encoder and a decoder, we provide a Reparameterizable Across-resolution Fusion Network (RAFNet) for real-time semantic segmentation. Extensive experiments demonstrate that our RAFNet achieves a promising trade-off between segmentation accuracy, inference speed and network parameters. Specifically, our RAFNet with only 0.96M parameters obtains 75.3% mIoU at 107 FPS and 75.8% mIoU at 195 FPS on Cityscapes and CamVid test sets for full-resolution inputs, respectively. After quantization and deployment on a Xilinx ZCU104 device, our RAFNet obtains a favorable segmentation performance with only 1.4W power.

引用

页码：1212 / 1227

页数：16

共 46 条

[1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[2] Deep Spatio-Temporal Random Fields for Efficient Video Segmentation [J].

Chandra, Siddhartha ;

Couprie, Camille ;

Kokkinos, Iasonas .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8915-8924

[3] HarDNet: A Low Memory Traffic Network [J].

Chao, Ping ;

Kao, Chao-Yang ;

Ruan, Yu-Shan ;

Huang, Chien-Hsiang ;

Lin, Youn-Long .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3551-3560

[4] JS']JSPNet: Learning joint semantic & instance segmentation of point clouds via feature self-similarity and cross-task probability [J].

Chen, Feng ;

Wu, Fei ;

Gao, Guangwei ;

Ji, Yimu ;

Xu, Jing ;

Jiang, Guo-Ping ;

Jing, Xiao-Yuan .

PATTERN RECOGNITION, 2022, 122

[5] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[6]

Chen W., 2020, P INT C LEARN REPR I, P1

[7] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[8] RepVGG: Making VGG-style ConvNets Great Again [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Ma, Ningning ;

Han, Jungong ;

Ding, Guiguang ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737

[9] Rethinking BiSeNet For Real-time Semantic Segmentation [J].

Fan, Mingyuan ;

Lai, Shenqi ;

Huang, Junshi ;

Wei, Xiaoming ;

Chai, Zhenhua ;

Luo, Junfeng ;

Wei, Xiaolin .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9711-9720

[10] SSAP: Single-Shot Instance Segmentation With Affinity Pyramid [J].

Gao, Naiyu ;

Shan, Yanhu ;

Wang, Yupei ;

Zhao, Xin ;

Yu, Yinan ;

Yang, Ming ;

Huang, Kaiqi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :642-651

← 1 2 3 4 5 →