TSTR: A Real-Time RGB-Thermal Semantic Segmentation Model with Multimodal Fusion Transformers

被引:0
作者
Zhao, Guogiang [1 ]
Yan, Xiaoyun [1 ]
Cui, Aodie [1 ]
Hu, Chang [1 ]
Bao, Jiaqi [1 ]
Huang, Junjie [1 ]
机构
[1] Wuhan Textile Univ, Sch Comp Sci & Artificial Intelligence, Wuhan, Peoples R China
来源
2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023 | 2023年
关键词
RGB-T Semantic Segmentation; Vision Transformers; Real-time Processing; Information Fusion; NETWORK;
D O I
10.1109/MSN60784.2023.00088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation plays a crucial role in scene understanding, particularly in the context of autonomous driving. Traditional methods mainly rely on single-modal RGB images, which perform well only under good lighting conditions during the daytime and poorly in challenging lighting conditions like nighttime. Recent research has started integrating thermal images generated by thermal cameras as an additional modality into deep neural networks, known as RGB-T semantic segmentation. However, existing approaches are limited by the limited receptive field of CNNs and often rely on complex fusion modules to improve accuracy, neglecting the computational time of the model. To address these issues, we propose a real-time multimodal fusion network (TSTR) for RGB-T semantic segmentation. Specifically, taking advantage of the global self-attention mechanism and long-range dependency modeling of transformers, we design a dual-branch encoder based on transformers to extract RGB and thermal features separately. To ensure real-time performance, we introduce more efficient self-attention blocks and employ a lightweight MLP decoder to aggregate information from different levels for dense predictions. Additionally, we employ a simple element-wise addition operation for multimodal feature fusion. Extensive quantitative and qualitative experiments demonstrate that the proposed network outperforms several state-of-the-art real-time RGB-T semantic segmentation methods. It achieves an mIoU of 56.2% on benchmark datasets, running in real-time at a speed of 59 FPS with a 480x640 input on an NVIDIA GeForce GTX 2080 Ti GPU. This method is significantly faster than existing networks while achieving better segmentation accuracy.
引用
收藏
页码:588 / 595
页数:8
相关论文
共 31 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]   ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation [J].
Cao, Jinming ;
Leng, Hanchao ;
Lischinski, Dani ;
Cohen-Or, Danny ;
Tu, Changhe ;
Li, Yangyan .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :7068-7077
[3]  
Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.48550/arXiv.1412.7062]
[4]  
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587
[5]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[6]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[7]   FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time Semantic Segmentation [J].
Deng, Fuqin ;
Feng, Hua ;
Liang, Mingjian ;
Wang, Hongmin ;
Yang, Yong ;
Gao, Yuan ;
Chen, Junfeng ;
Hu, Junjie ;
Guo, Xiyue ;
Lam, Tin Lun .
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :4467-4473
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]   Rethinking BiSeNet For Real-time Semantic Segmentation [J].
Fan, Mingyuan ;
Lai, Shenqi ;
Huang, Junshi ;
Wei, Xiaoming ;
Chai, Zhenhua ;
Luo, Junfeng ;
Wei, Xiaolin .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9711-9720
[10]  
Ha Q, 2017, IEEE INT C INT ROBOT, P5108, DOI 10.1109/IROS.2017.8206396