TrSeg: Transformer for semantic segmentation

被引：71

作者：

Jin, Youngsaeng ^{[1
]}

Han, David ^{[2
]}

Ko, Hanseok ^{[1
]}

机构：

[1] Korea Unvers, Sch Elect Engn, 145 Anam Ro, Seoul 02841, South Korea

[2] Drexel Univ, Dept Elect & Comp Engn, 3141 Chestnut St, Philadelphia, PA 19104 USA

来源：

PATTERN RECOGNITION LETTERS | 2021年 / 148卷

关键词：

Semantic segmentation; Scene understanding; Transformer; Multi-scale contextual information;

D O I：

10.1016/j.patrec.2021.04.024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent effort s in semantic segment ation using deep learning frameworks have made notable advances. However, capturing the existence of objects in an image at multiple scales still remains a challenge. In this paper, we address the semantic segmentation task based on transformer architecture. Unlike exist-ing methods that capture multi-scale contextual information through infusing every single-scale piece of information from parallel paths, we propose a novel semantic segmentation network incorporating a transformer (TrSeg) to adaptively capture multi-scale information with the dependencies on original con-textual information. Given the original contextual information as keys and values, the multi-scale con-textual information from the multi-scale pooling module as queries is transformed by the transformer decoder. The experimental results show that TrSeg outperforms the other methods of capturing multi-scale information by large margins. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：29 / 35

页数：7

共 32 条

[1]

[Anonymous], 2009, CVPR09

[2] VQA: Visual Question Answering [J].

Antol, Stanislaw ;

Agrawal, Aishwarya ;

Lu, Jiasen ;

Mitchell, Margaret ;

Batra, Dhruv ;

Zitnick, C. Lawrence ;

Parikh, Devi .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2425-2433

[3] Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation [J].

Bilinski, Piotr ;

Prisacariu, Victor .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6596-6605

[4]

Carbonell Jaime, 2019, ARXIV PREPRINT ARXIV

[5]

Carion N., 2020, ARXIV PREPRINT ARXIV

[6]

Chen L.-C., 2017, ARXIV PREPRINT ARXIV

[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 →