Spatio-Temporal Self-Attention Network for Fire Detection and Segmentation in Video Surveillance

被引:30
作者
Shahid, Mohammad [1 ]
Virtusio, John Jethro [1 ]
Wu, Yu-Hsien [1 ]
Chen, Yung-Yao [2 ]
Tanveer, M. [3 ]
Muhammad, Khan [4 ]
Hua, Kai-Lung [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 10633, Taiwan
[2] Natl Taiwan Univ Sci & Technol, Dept Elect & Comp Engn, Taipei 10633, Taiwan
[3] IIT Indore, Discipline Math, Indore 453552, India
[4] Sungkyunkwan Univ, Coll Comp & Informat, Sch Convergence, Visual Analyt Knowledge Lab VIS2KNOW Lab, Seoul 03063, South Korea
关键词
Fire detection; early detection; disaster management; small-sized fire; video fire segmentation; semi-supervised; REAL-TIME FIRE; OBJECT DETECTION;
D O I
10.1109/ACCESS.2021.3132787
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional Neural Networks (CNNs) based approaches are popular for various image/video related tasks due to their state-of-the-art performance. However, for problems like object detection and segmentation, CNNs still suffer from objects with arbitrary shapes, sizes, occlusions, and varying viewpoints. This problem makes it mostly unsuitable for fire detection and segmentation since flames can have an unpredictable scale and shape. In this paper, we propose a method that detects and segments fire-regions with special considerations of their arbitrary sizes and shapes. Specifically, our approach uses a self-attention mechanism to augment spatial characteristics with temporal features, allowing the network to reduce its reliance on spatial factors like shape or size and take advantage of robust spatial-temporal dependencies. As a whole, our pipeline has two stages: In the first stage, we take out region proposals using Spatial-Temporal features, and in the second stage, we classify whether each region proposal is flame or not. Due to the scarcity of generous fire datasets, we adopt a transfer learning strategy to pre-train our classifier with the ImageNet dataset. Additionally, our Spatial-Temporal Network only requires semi-supervision, where it only needs one ground-truth segmentation mask per frame-sequence input. The experimental results of our proposed method significantly outperform the state-of-the-art fire detection with a 2 similar to 4% relative enhancement in F1-score for large scale fires and a nearly similar to 60% relative improvement for small fires at a very early stage.
引用
收藏
页码:1259 / 1275
页数:17
相关论文
共 68 条
[1]  
Akgun M, 2019, 16 IEEE INT C ADV VI, P1
[2]   Auto-Zooming CNN-Based Framework for Real-Time Pedestrian Detection in Outdoor Surveillance Videos [J].
Alfasly, Saghir ;
Liu, Beibei ;
Hu, Yongjian ;
Wang, Yufei ;
Li, Chang-Tsun .
IEEE ACCESS, 2019, 7 :105816-105826
[3]   Recurrent residual U-Net for medical image segmentation [J].
Alom, Md Zahangir ;
Yakopcic, Chris ;
Hasan, Mahmudul ;
Taha, Tarek M. ;
Asari, Vijayan K. .
JOURNAL OF MEDICAL IMAGING, 2019, 6 (01)
[4]  
[Anonymous], 2019, ELECTRONICS-SWITZ, DOI [DOI 10.3390/electronics8101131, 10.3390/electronics8101131]
[5]  
Antioquia AMC, 2019, IEEE IMAGE PROC, P76, DOI [10.1109/icip.2019.8802913, 10.1109/ICIP.2019.8802913]
[6]   One-Shot Video Object Segmentation [J].
Caelles, S. ;
Maninis, K. -K. ;
Pont-Tuset, J. ;
Leal-Taixe, L. ;
Cremers, D. ;
Van Gool, L. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5320-5329
[7]  
Chen TH, 2004, IEEE IMAGE PROC, P1707
[8]   A survey on object detection in optical remote sensing images [J].
Cheng, Gong ;
Han, Junwei .
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 117 :11-28
[9]   Object Counting and Instance Segmentation with Image-level Supervision [J].
Cholakkal, Hisham ;
Sun, Guolei ;
Khan, Fahad Shahbaz ;
Shao, Ling .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12389-12397
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848