Wildfire Segmentation Using Deep Vision Transformers

被引:65
作者
Ghali, Rafik [1 ,2 ]
Akhloufi, Moulay A. [1 ]
Jmal, Marwa [3 ]
Mseddi, Wided Souidene [2 ]
Attia, Rabah [2 ]
机构
[1] Univ Moncton, Percept Robot & Intelligent Machines Res Grp PRIM, Dept Comp Sci, 18 Antonine Maillet Ave, Moncton, NB E1A 3E9, Canada
[2] Univ Carthage, Ecole Polytech Tunisie, SERCOM Lab, La Marsa 77-1054, Carthage, Tunisia
[3] Telnet Holding, Telnet Innovat Labs, Parc Elghazela Technol Commun, Ariana 2088, Tunisia
基金
加拿大自然科学与工程研究理事会;
关键词
forest fire detection; fire segmentation; vision Transformer; TransUNet; MedT; wildfires; FIRE-DETECTION; BURNED AREA; IMAGE; ALGORITHM; SEQUENCES; SEVERITY; NETWORK; COLOR;
D O I
10.3390/rs13173527
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In this paper, we address the problem of forest fires' early detection and segmentation in order to predict their spread and help with fire fighting. Techniques based on Convolutional Networks are the most used and have proven to be efficient at solving such a problem. However, they remain limited in modeling the long-range relationship between objects in the image, due to the intrinsic locality of convolution operators. In order to overcome this drawback, Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures. They have recently been used to determine the global dependencies between input and output sequences using the self-attention mechanism. In this context, we present in this work the very first study, which explores the potential of vision Transformers in the context of forest fire segmentation. Two vision-based Transformers are used, TransUNet and MedT. Thus, we design two frameworks based on the former image Transformers adapted to our complex, non-structured environment, which we evaluate using varying backbones and we optimize for forest fires' segmentation. Extensive evaluations of both frameworks revealed a performance superior to current methods. The proposed approaches achieved a state-of-the-art performance with an F1-score of 97.7% for TransUNet architecture and 96.0% for MedT architecture. The analysis of the results showed that these models reduce fire pixels mis-classifications thanks to the extraction of both global and local features, which provide finer detection of the fire's shape.
引用
收藏
页数:24
相关论文
共 96 条
[91]   Learning Texture Transformer Network for Image Super-Resolution [J].
Yang, Fuzhi ;
Yang, Huan ;
Fu, Jianlong ;
Lu, Hongtao ;
Guo, Baining .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5790-5799
[92]   Cross-Modal Self-Attention Network for Referring Image Segmentation [J].
Ye, Linwei ;
Rochan, Mrigank ;
Liu, Zhi ;
Wang, Yang .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10494-10503
[93]  
Yesilkaynak V.B., 2020, ARXIV200906469
[94]   Missing Data Reconstruction in Remote Sensing Image With a Unified Spatial-Temporal-Spectral Deep Convolutional Neural Network [J].
Zhang, Qiang ;
Yuan, Qiangqiang ;
Zeng, Chao ;
Li, Xinghua ;
Wei, Yancong .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2018, 56 (08) :4274-4288
[95]   Object Detection With Deep Learning: A Review [J].
Zhao, Zhong-Qiu ;
Zheng, Peng ;
Xu, Shou-Tao ;
Wu, Xindong .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) :3212-3232
[96]   A New Model for Transfer Learning-Based Mapping of Burn Severity [J].
Zheng, Zhong ;
Wang, Jinfei ;
Shan, Bo ;
He, Yongjun ;
Liao, Chunhua ;
Gao, Yanghua ;
Yang, Shiqi .
REMOTE SENSING, 2020, 12 (04)