GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation

被引：218

作者：

Zhou, Wujie ^{[1
,2
]}

Liu, Jinfu ^{[1
]}

Lei, Jingsheng ^{[1
]}

Yu, Lu ^{[2
]}

Hwang, Jenq-Neng ^{[3
]}

机构：

[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China

[2] Zhejiang Univ, Inst Informat & Commun Engn, Hangzhou 310027, Peoples R China

[3] Univ Washington, Dept Elect Engn, Seattle, WA 98105 USA

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Image segmentation; Semantics; Feature extraction; Decoding; Temperature sensors; Robot sensing systems; Motion segmentation; RGB-thermal semantic segmentation; graded-features; cross-modal fusion; multilabel-learning; refinement strategy;

D O I：

10.1109/TIP.2021.3109518

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation is a fundamental task in computer vision, and it has various applications in fields such as robotic sensing, video surveillance, and autonomous driving. A major research topic in urban road semantic segmentation is the proper integration and use of cross-modal information for fusion. Here, we attempt to leverage inherent multimodal information and acquire graded features to develop a novel multilabel-learning network for RGB-thermal urban scene semantic segmentation. Specifically, we propose a strategy for graded-feature extraction to split multilevel features into junior, intermediate, and senior levels. Then, we integrate RGB and thermal modalities with two distinct fusion modules, namely a shallow feature fusion module and deep feature fusion module for junior and senior features. Finally, we use multilabel supervision to optimize the network in terms of semantic, binary, and boundary characteristics. Experimental results confirm that the proposed architecture, the graded-feature multilabel-learning network, outperforms state-of-the-art methods for urban scene semantic segmentation, and it can be generalized to depth data.

引用

页码：7790 / 7802

页数：13

共 55 条

[1]

[Anonymous], 2014, Comput. Sci.

[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[3] The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks [J].

Berman, Maxim ;

Triki, Amal Rannen ;

Blaschko, Matthew B. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4413-4421

[4] Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [J].

Chen, Xiaokang ;

Lin, Kwan-Yee ;

Wang, Jingbo ;

Wu, Wayne ;

Qian, Chen ;

Li, Hongsheng ;

Zeng, Gang .

COMPUTER VISION - ECCV 2020, PT XI, 2020, 12356 :561-577

[5]

Deng L., 2019, Rfbnet: Deep multimodal networks with residual fusion blocks for rgb-d semantic segmentation

[6] Semantic Segmentation With Context Encoding and Multi-Path Decoding [J].

Ding, Henghui ;

Jiang, Xudong ;

Shuai, Bing ;

Liu, Ai Qun ;

Wang, Gang .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :3520-3533

[7]

Gao XN, 2019, PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA 2019, P192, DOI 10.1145/3351180.3351182

[8] Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images [J].

Gupta, Saurabh ;

Arbelaez, Pablo ;

Malik, Jitendra .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :564-571

[9]

Ha Q, 2017, IEEE INT C INT ROBOT, P5108, DOI 10.1109/IROS.2017.8206396

[10] Reinforcement Cutting-Agent Learning for Video Object Segmentation [J].

Han, Junwei ;

Yang, Le ;

Zhang, Dingwen ;

Chang, Xiaojun ;

Liang, Xiaodan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9080-9089

← 1 2 3 4 5 6 →