Fire Detection Approach Based on Vision Transformer

被引：8

作者：

Khudayberdiev, Otabek ^{[1
]}

Zhang, Jiashu ^{[1
]}

Elkhalil, Ahmed ^{[1
]}

Balde, Lansana ^{[1
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu, Peoples R China

来源：

ARTIFICIAL INTELLIGENCE AND SECURITY, ICAIS 2022, PT I | 2022年 / 13338卷

关键词：

Vision transformer; Self-attention; Convolutional neural networks; Fire detection; Image classification; CONVOLUTIONAL NEURAL-NETWORKS; SURVEILLANCE;

D O I：

10.1007/978-3-031-06794-5_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Considering the rapid development of embedding surveillance video systems for fire monitoring, we need to distribute systems with high accuracy and detection speed. Recent progress in vision-based fire detection techniques achieved remarkable success by the powerful ability of deep convolutional neural networks. CNN's have long been the architecture of choice for computer vision tasks. However, current CNN-based methods consider fire classification entire image pixels as equal, ignoring regardless of information. Thus, this can cause a low accuracy rate and delay detection. To increase detection speed and achieve high accuracy, we propose a fire detection approach based on Vision Transformer as a viable alternative to CNN. Different from convolutional networks, transformers operate with images as a sequence of patches, selectively attending to different image parts based on context. In addition, the attention mechanism in the transformer solves the problem with a small flame, thereby provide detection fire in the early stage. Since transformers using global self-attention, which conducts complex computing, we utilize fine-tuned Swin Transformer as our backbone architecture that computes self-attention with local windows. Thus, solving the classification problems with high-resolution images. Experimental results conducted on the image fire dataset demonstrate the promising capability of the model compared to state-of-the-art methods. Specifically, Vision Transformer obtains a classification accuracy of 98.54% on the publicly available dataset.

引用

页码：41 / 53

页数：13

共 24 条

[1]

Beal J, 2020, PREPRINT

[2] Intelligent and vision-based fire detection systems: A survey [J].

Bu, Fengju ;

Gharajeh, Mohammad Samadi .

IMAGE AND VISION COMPUTING, 2019, 91

[3] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].

Chen, Chun-Fu ;

Fan, Quanfu ;

Panda, Rameswar .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356

[4]

Chen J., 2021, arXiv, DOI 10.48550/arXiv:2102.04306

[5]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[6]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[7]

Frizzi S, 2016, IEEE IND ELEC, P877, DOI 10.1109/IECON.2016.7793196

[8]

github, RWIGHTMAN PYTORCH IM

[9]

Khan S., 2021, arXiv

[10] Image fire detection algorithms based on convolutional neural networks [J].

Li, Pu ;

Zhao, Wangda .

CASE STUDIES IN THERMAL ENGINEERING, 2020, 19

← 1 2 3 →