Sparse Refinement for Efficient High-Resolution Semantic Segmentation

被引:0
作者
Liu, Zhijian [1 ,2 ]
Zhang, Zhuoyang [3 ]
Khaki, Samir [4 ]
Yang, Shang [1 ]
Tang, Haotian [1 ]
Xu, Chenfeng [5 ]
Keutzer, Kurt [5 ]
Han, Song [1 ,2 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] NVIDIA, Cambridge, MA 02138 USA
[3] Tsinghua Univ, Beijing, Peoples R China
[4] Univ Toronto, Toronto, ON, Canada
[5] Univ Calif Berkeley, Berkeley, CA USA
来源
COMPUTER VISION - ECCV 2024, PT LXVII | 2025年 / 15125卷
基金
美国国家科学基金会;
关键词
D O I
10.1007/978-3-031-72855-6_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefine, a novel approach that enhances dense low-resolution predictions with sparse high-resolution refinements. Based on coarse low-resolution outputs, SparseRefine first uses an entropy selector to identify a sparse set of pixels with high entropy. It then employs a sparse feature extractor to efficiently generate the refinements for those pixels of interest. Finally, it leverages a gated ensembler to apply these sparse refinements to the initial coarse predictions. SparseRefine can be seamlessly integrated into any existing semantic segmentation model, regardless of CNN- or ViT-based. SparseRefine achieves significant speedup: 1.5 to 3.7 times when applied to HRNet-W48, SegFormer-B5, Mask2FormerT/L and SegNeXt-L on Cityscapes, with negligible to no loss of accuracy. Our "dense+sparse" paradigm paves the way for efficient high-resolution visual computing.
引用
收藏
页码:108 / 127
页数:20
相关论文
共 107 条
[1]   A review of uncertainty quantification in deep learning: Techniques, applications and challenges [J].
Abdar, Moloud ;
Pourpanah, Farhad ;
Hussain, Sadiq ;
Rezazadegan, Dana ;
Liu, Li ;
Ghavamzadeh, Mohammad ;
Fieguth, Paul ;
Cao, Xiaochun ;
Khosravi, Abbas ;
Acharya, U. Rajendra ;
Makarenkov, Vladimir ;
Nahavandi, Saeid .
INFORMATION FUSION, 2021, 76 :243-297
[2]  
[Anonymous], 2017, CUBLAS
[3]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[4]  
Blake A, 2004, LECT NOTES COMPUT SC, V3021, P428
[5]  
Bolya D., 2023, arXiv
[6]  
Bolya Daniel, 2023, ICLR
[7]   Fast approximate energy minimization via graph cuts [J].
Boykov, Y ;
Veksler, O ;
Zabih, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (11) :1222-1239
[8]  
Cai H., 2022, ARXIV
[9]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[10]  
Chen L., 2017, CORR