Forest Fire Segmentation via Temporal Transformer from Aerial Images

被引:19
作者
Shahid, Mohammad [1 ]
Chen, Shang-Fu [1 ]
Hsu, Yu-Ling [2 ]
Chen, Yung-Yao [3 ]
Chen, Yi-Ling [1 ]
Hua, Kai-Lung [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 106335, Taiwan
[2] Natl Taiwan Univ Sci & Technol, Dept Ind Management, Taipei 106335, Taiwan
[3] Natl Taiwan Univ Sci & Technol, Dept Elect & Comp Engn, Taipei 106335, Taiwan
关键词
fire segmentation; vision transformers; deep learning; NETWORK;
D O I
10.3390/f14030563
中图分类号
S7 [林业];
学科分类号
0829 ; 0907 ;
摘要
Forest fires are among the most critical natural tragedies threatening forest lands and resources. The accurate and early detection of forest fires is essential to reduce losses and improve firefighting. Conventional firefighting techniques, based on ground inspection and limited by the field-of-view, lead to insufficient monitoring capabilities for large areas. Recently, due to their excellent flexibility and ability to cover large regions, unmanned aerial vehicles (UAVs) have been used to combat forest fire incidents. An essential step for an autonomous system that monitors fire situations is first to locate the fire in a video. State-of-the-art forest-fire segmentation methods based on vision transformers (ViTs) and convolutional neural networks (CNNs) use a single aerial image. Nevertheless, fire has an inconsistent scale and form, and small fires from long-distance cameras lack salient features, so accurate fire segmentation from a single image has been challenging. In addition, the techniques based on CNNs treat all image pixels equally and overlook global information, limiting their performance, while ViT-based methods suffer from high computational overhead. To address these issues, we proposed a spatiotemporal architecture called FFS-UNet, which exploited temporal information for forest-fire segmentation by combining a transformer into a modified lightweight UNet model. First, we extracted a keyframe and two reference frames using three different encoder paths in parallel to obtain shallow features and perform feature fusion. Then, we used a transformer to perform deep temporal-feature extraction, which enhanced the feature learning of the fire pixels and made the feature extraction more robust. Finally, we combined the shallow features of the keyframe for de-convolution in the decoder path via skip-connections to segment the fire. We evaluated empirical outcomes on the UAV-collected video and Corsican Fire datasets. The proposed FFS-UNet demonstrated enhanced performance with fewer parameters by achieving an F1-score of 95.1% and an IoU of 86.8% on the UAV-collected video, and an F1-score of 91.4% and an IoU of 84.8% on the Corsican Fire dataset, which were higher than previous forest fire techniques. Therefore, the suggested FFS-UNet model effectively resolved fire-monitoring issues with UAVs.
引用
收藏
页数:23
相关论文
共 56 条
[1]   A Review on Forest Fire Detection Techniques [J].
Alkhatib, Ahmad A. A. .
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2014,
[2]  
Ascoli Davide, 2021, Annals of Silvicultural Research, V46, P177, DOI 10.12899/asr-2264
[3]  
Ba J. L., 2016, CoRR
[4]   Early Fire Detection Based on Aerial 360-Degree Sensors, Deep Convolution Neural Networks and Exploitation of Fire Dynamic Textures [J].
Barmpoutis, Panagiotis ;
Stathaki, Tania ;
Dimitropoulos, Kosmas ;
Grammalidis, Nikos .
REMOTE SENSING, 2020, 12 (19) :1-17
[5]   Unprecedented burn area of Australian mega forest fires [J].
Boer, Matthias M. ;
Resco de Dios, Victor ;
Bradstock, Ross A. .
NATURE CLIMATE CHANGE, 2020, 10 (03) :171-172
[6]  
Cai Hongrui, 2022, arXiv
[7]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[8]   An Attention Enhanced Bidirectional LSTM for Early Forest Fire Smoke Recognition [J].
Cao, Yichao ;
Yang, Feng ;
Tang, Qingfei ;
Lu, Xiaobo .
IEEE ACCESS, 2019, 7 :154732-154742
[9]   Fire detection using statistical color model in video sequences [J].
Celik, Turgay ;
Demirel, Hasan ;
Ozkaramanli, Huseyin ;
Uyguroglu, Mustafa .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2007, 18 (02) :176-185
[10]  
Chang CY., 2018, 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), P1