Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

被引：17

作者：

Wang, Bowen ^{[1
]}

Li, Liangzhi ^{[1
]}

Nakashima, Yuta ^{[1
]}

Kawasaki, Ryo ^{[2
]}

Nagahara, Hajime ^{[1
]}

Yagi, Yasushi ^{[3
]}

机构：

[1] Osaka Univ, Inst Databil Sci IDS, Suita, Osaka 5650871, Japan

[2] Osaka Univ, Grad Sch Med, Suita, Osaka 5650871, Japan

[3] Osaka Univ, Inst Sci & Ind Res, Ibaraki 5670047, Japan

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Training; Feature extraction; Noise measurement; Semantics; Data models; Image segmentation; Computational modeling; Video semantic segmentation; noisy training; temporal awareness;

D O I：

10.1109/ACCESS.2021.3067928

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the temporal coherence in video frames, together with a simple yet effective training strategy that replaces a frame in a given video sequence with noises. Our training strategy spoils the temporal coherence in video frames and thus makes the temporal links in ConvLSTMs unreliable; this may consequently improve the ability of the model to extract features from video frames and serve as a regularizer to avoid overfitting, without requiring extra data annotations or computational costs. Experimental results demonstrate that the proposed model can achieve state-of-the-art performances on both the CityScapes and EndoVis2018 datasets. The code for the proposed method is available at https://github.com/wbw520/NoisyLSTM.

引用

页码：46810 / 46820

页数：11

共 36 条

[1]

Allan M., 2020, arXiv

[2]

[Anonymous], 2012, Improving neural networks by preventing co-adaptation of feature detectors

[3]

[Anonymous], 2016, PROC 4 INT C LEARN R

[4] TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].

BISHOP, CM .

NEURAL COMPUTATION, 1995, 7 (01) :108-116

[5]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[6] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[7]

de With P., 2016, P NETH C COMP VIS NC

[8] The PASCAL Visual Object Classes Challenge: A Retrospective [J].

Everingham, Mark ;

Eslami, S. M. Ali ;

Van Gool, Luc ;

Williams, Christopher K. I. ;

Winn, John ;

Zisserman, Andrew .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136

[9] Dual Attention Network for Scene Segmentation [J].

Fu, Jun ;

Liu, Jing ;

Tian, Haijie ;

Li, Yong ;

Bao, Yongjun ;

Fang, Zhiwei ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149

[10] Semantic Video CNNs through Representation Warping [J].

Gadde, Raghudeep ;

Jampani, Varun ;

Gehler, Peter V. .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4463-4472

← 1 2 3 4 →