TrSeg: Transformer for semantic segmentation

被引:71
作者
Jin, Youngsaeng [1 ]
Han, David [2 ]
Ko, Hanseok [1 ]
机构
[1] Korea Unvers, Sch Elect Engn, 145 Anam Ro, Seoul 02841, South Korea
[2] Drexel Univ, Dept Elect & Comp Engn, 3141 Chestnut St, Philadelphia, PA 19104 USA
关键词
Semantic segmentation; Scene understanding; Transformer; Multi-scale contextual information;
D O I
10.1016/j.patrec.2021.04.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent effort s in semantic segment ation using deep learning frameworks have made notable advances. However, capturing the existence of objects in an image at multiple scales still remains a challenge. In this paper, we address the semantic segmentation task based on transformer architecture. Unlike exist-ing methods that capture multi-scale contextual information through infusing every single-scale piece of information from parallel paths, we propose a novel semantic segmentation network incorporating a transformer (TrSeg) to adaptively capture multi-scale information with the dependencies on original con-textual information. Given the original contextual information as keys and values, the multi-scale con-textual information from the multi-scale pooling module as queries is transformed by the transformer decoder. The experimental results show that TrSeg outperforms the other methods of capturing multi-scale information by large margins. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:29 / 35
页数:7
相关论文
共 32 条
[1]  
[Anonymous], 2009, CVPR09
[2]   VQA: Visual Question Answering [J].
Antol, Stanislaw ;
Agrawal, Aishwarya ;
Lu, Jiasen ;
Mitchell, Margaret ;
Batra, Dhruv ;
Zitnick, C. Lawrence ;
Parikh, Devi .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433
[3]   Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation [J].
Bilinski, Piotr ;
Prisacariu, Victor .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6596-6605
[4]  
Carbonell Jaime, 2019, ARXIV PREPRINT ARXIV
[5]  
Carion N., 2020, ARXIV PREPRINT ARXIV
[6]  
Chen L.-C., 2017, ARXIV PREPRINT ARXIV
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[10]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171