Sparse Refinement for Efficient High-Resolution Semantic Segmentation

被引：0

作者：

Liu, Zhijian ^{[1
,2
]}

Zhang, Zhuoyang ^{[3
]}

Khaki, Samir ^{[4
]}

Yang, Shang ^{[1
]}

Tang, Haotian ^{[1
]}

Xu, Chenfeng ^{[5
]}

Keutzer, Kurt ^{[5
]}

Han, Song ^{[1
,2
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] NVIDIA, Cambridge, MA 02138 USA

[3] Tsinghua Univ, Beijing, Peoples R China

[4] Univ Toronto, Toronto, ON, Canada

[5] Univ Calif Berkeley, Berkeley, CA USA

来源：

COMPUTER VISION - ECCV 2024, PT LXVII | 2025年 / 15125卷

基金：

美国国家科学基金会;

关键词：

D O I：

10.1007/978-3-031-72855-6_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefine, a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. Based on coarse low-resolution outputs, SparseRefine first uses an entropy selector to identify a sparse set of pixels with high entropy. It then employs a sparse feature extractor to efficiently generate the refinements for those pixels of interest. Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions. SparseRefine can be seamlessly integrated into any existing semantic segmentation model, regardless of CNN- or ViT-based. SparseRefine achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2FormerT/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy. Our "dense+sparse" paradigm paves the way for efficient high-resolution visual computing.

引用

页码：108 / 127

页数：20

共 107 条

[1] A review of uncertainty quantification in deep learning: Techniques, applications and challenges [J].

Abdar, Moloud ;

Pourpanah, Farhad ;

Hussain, Sadiq ;

Rezazadegan, Dana ;

Liu, Li ;

Ghavamzadeh, Mohammad ;

Fieguth, Paul ;

Cao, Xiaochun ;

Khosravi, Abbas ;

Acharya, U. Rajendra ;

Makarenkov, Vladimir ;

Nahavandi, Saeid .

INFORMATION FUSION, 2021, 76 :243-297

[2]

[Anonymous], 2017, CUBLAS

[3] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[4]

Blake A, 2004, LECT NOTES COMPUT SC, V3021, P428

[5]

Bolya D., 2023, arXiv

[6]

Bolya Daniel, 2023, ICLR

[7] Fast approximate energy minimization via graph cuts [J].

Boykov, Y ;

Veksler, O ;

Zabih, R .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (11) :1222-1239

[8]

Cai H., 2022, ARXIV

[9]

Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9

[10]

Chen L., 2017, CORR

← 1 2 3 4 5 6 7 8 9 10 →