Noisy-LSTM: Improving Temporal Awareness for Video Semantic Segmentation

被引:17
作者
Wang, Bowen [1 ]
Li, Liangzhi [1 ]
Nakashima, Yuta [1 ]
Kawasaki, Ryo [2 ]
Nagahara, Hajime [1 ]
Yagi, Yasushi [3 ]
机构
[1] Osaka Univ, Inst Databil Sci IDS, Suita, Osaka 5650871, Japan
[2] Osaka Univ, Grad Sch Med, Suita, Osaka 5650871, Japan
[3] Osaka Univ, Inst Sci & Ind Res, Ibaraki 5670047, Japan
关键词
Training; Feature extraction; Noise measurement; Semantics; Data models; Image segmentation; Computational modeling; Video semantic segmentation; noisy training; temporal awareness;
D O I
10.1109/ACCESS.2021.3067928
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Semantic video segmentation is a key challenge for various applications. This paper presents a new model named Noisy-LSTM, which is trainable in an end-to-end manner, with convolutional LSTMs (ConvLSTMs) to leverage the temporal coherence in video frames, together with a simple yet effective training strategy that replaces a frame in a given video sequence with noises. Our training strategy spoils the temporal coherence in video frames and thus makes the temporal links in ConvLSTMs unreliable; this may consequently improve the ability of the model to extract features from video frames and serve as a regularizer to avoid overfitting, without requiring extra data annotations or computational costs. Experimental results demonstrate that the proposed model can achieve state-of-the-art performances on both the CityScapes and EndoVis2018 datasets. The code for the proposed method is available at https://github.com/wbw520/NoisyLSTM.
引用
收藏
页码:46810 / 46820
页数:11
相关论文
共 36 条
[1]  
Allan M., 2020, arXiv
[2]  
[Anonymous], 2012, Improving neural networks by preventing co-adaptation of feature detectors
[3]  
[Anonymous], 2016, PROC 4 INT C LEARN R
[4]   TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].
BISHOP, CM .
NEURAL COMPUTATION, 1995, 7 (01) :108-116
[5]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]  
de With P., 2016, P NETH C COMP VIS NC
[8]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[9]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[10]   Semantic Video CNNs through Representation Warping [J].
Gadde, Raghudeep ;
Jampani, Varun ;
Gehler, Peter V. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4463-4472