Improving Semantic Segmentation via Decoupled Body and Edge Supervision

被引：210

作者：

Li, Xiangtai ^{[1
]}

Li, Xia ^{[1
,2
]}

Zhang, Li ^{[3
]}

Cheng, Guangliang ^{[4
]}

Shi, Jianping ^{[4
]}

Lin, Zhouchen ^{[1
]}

Tan, Shaohua ^{[1
]}

Tong, Yunhai ^{[1
]}

机构：

[1] Peking Univ, Sch EECS, Key Lab Machine Percept, MOE, Beijing, Peoples R China

[2] Zhejiang Lab, Hangzhou, Peoples R China

[3] Univ Oxford, Dept Engn Sci, Oxford, England

[4] SenseTime Res, Hong Kong, Peoples R China

来源：

COMPUTER VISION - ECCV 2020, PT XVII | 2020年 / 12362卷

关键词：

Semantic segmentation; Edge supervision; Flow field; Multi-task learning;

D O I：

10.1007/978-3-030-58520-4_26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion. In this paper, a new paradigm for semantic segmentation is proposed. Our insight is that appealing performance of semantic segmentation requires explicitly modeling the object body and edge, which correspond to the high and low frequency of the image. To do so, we first warp the image feature by learning a flow field to make the object part more consistent. The resulting body feature and the residual edge feature are further optimized under decoupled supervision by explicitly sampling different parts (body or edge) pixels. We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries. Extensive experiments on four major road scene semantic segmentation benchmarks including Cityscapes, CamVid, KIITI and BDD show that our proposed approach establishes new state of the art while retaining high efficiency in inference. In particular, we achieve 83.7 mIoU % on Cityscape with only fine-annotated data. Code and models are made available to foster any further research (https://github.com/lxtGH/DecoupleSegNets).

引用

页码：435 / 452

页数：18

共 69 条

[1]

Andreas G., 2012, CVPR

[2]

[Anonymous], 2018, LNCS, V11205, P605, DOI [10.1007/978-3-030-01246-536, DOI 10.1007/978-3-030-01246-536]

[3] Convolutional Random Walk Networks for Semantic Image Segmentation [J].

Bertasius, Gedas ;

Torresani, Lorenzo ;

Yu, Stella X. ;

Shi, Jianbo .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6137-6145

[4] Semantic Segmentation with Boundary Neural Fields [J].

Bertasius, Gedas ;

Shi, Jianbo ;

Torresani, Lorenzo .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3602-3610

[5] Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation [J].

Bilinski, Piotr ;

Prisacariu, Victor .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6596-6605

[6] Semantic object classes in video: A high-definition ground truth database [J].

Brostow, Gabriel J. ;

Fauqueur, Julien ;

Cipolla, Roberto .

PATTERN RECOGNITION LETTERS, 2009, 30 (02) :88-97

[7] In-Place Activated BatchNorm for Memory-Optimized Training of DNNs [J].

Bulo, Samuel Rota ;

Porzi, Lorenzo ;

Kontschieder, Peter .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5639-5647

[8] Deep Spatio-Temporal Random Fields for Efficient Video Segmentation [J].

Chandra, Siddhartha ;

Couprie, Camille ;

Kokkinos, Iasonas .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8915-8924

[9]

Chen L., 2014, arXiv preprint arXiv:1412.7062

[10]

Chen LC, 2018, ADV NEUR IN, V31

← 1 2 3 4 5 6 7 →