STC-GAN: Spatio-Temporally Coupled Generative Adversarial Networks for Predictive Scene Parsing

被引：25

作者：

Qi, Mengshi ^{[1
,2
]}

Wang, Yunhong ^{[1
]}

Li, Annan ^{[1
]}

Luo, Jiebo ^{[3
]}

机构：

[1] Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Sch Comp Sci & Engn, Beijing 100191, Peoples R China

[2] Ecole Polytech Fed Lausanne, Comp Vis Lab, CH-1015 Lausanne, Switzerland

[3] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2020年 / 29卷 / 29期

关键词：

Predictive Scene Parsing; Generative Adversarial Networks; Coupled Architecture; Spatio-Temporal Features;

D O I：

10.1109/TIP.2020.2983567

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Predictive scene parsing is a task of assigning pixel-level semantic labels to a future frame of a video. It has many applications in vision-based artificial intelligent systems, e.g., autonomous driving and robot navigation. Although previous work has shown its promising performance in semantic segmentation of images and videos, it is still quite challenging to anticipate future scene parsing with limited annotated training data. In this paper, we propose a novel model called STC-GAN, Spatio- Temporally Coupled Generative Adversarial Networks for predictive scene parsing, which employ both convolutional neural networks and convolutional long short-term memory (LSTM) in the encoder-decoder architecture. By virtue of STC-GAN, both spatial layout and semantic context can be captured by the spatial encoder effectively, while motion dynamics are extracted by the temporal encoder accurately. Furthermore, a coupled architecture is presented for establishing joint adversarial training where the weights are shared and features are transformed in an adaptive fashion between the future frame generation model and predictive scene parsing model. Consequently, the proposed STC-GAN is able to learn valuable features from unlabeled video data. We evaluate our proposed STC-GAN on two public datasets, i.e., Cityscapes and CamVid. Experimental results demonstrate that our method outperforms the state-of-the-art.

引用

页码：5420 / 5430

页数：11

共 56 条

[1] [Anonymous], 2017, 31 AAAI C ART INT
[2] [Anonymous], 2016, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-46454-1_19
[3] [Anonymous], 2019, IEEE INT CON MULTI, DOI DOI 10.1109/ICME.2019.00280
[4] [Anonymous], 2018, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2018.00686
[5] [Anonymous], 2015, 1511 ARXIV
[6] [Anonymous], 2018, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2018.00628
[7] [Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.595
[8] [Anonymous], 2017, ARXIV PREPRINT ARXIV
[9] [Anonymous], 2015, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-24574-4_28
[10] Badrinarayanan V., 2015, SEGNET DEEP CONVOLUT

← 1 2 3 4 5 6 →