Generative EO/IR multi-scale vision transformer for improved object detection

被引：0

作者：

Christian, Jonathan ^{[1
]}

Bright, Max ^{[1
]}

Summers, Jason ^{[1
]}

Olson, Ashley ^{[2
]}

Havens, Tim ^{[2
]}

机构：

[1] ARiA, 305 S Main St, Madison, VA 22727 USA

[2] Michigan Technol Univ, 100 Phoenix Dr, Houghton, MI 49931 USA

来源：

SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II | 2024年 / 13035卷

关键词：

Generative vision transformer; infrared synthesis; object detection; denoising autoencoder training; computer vision; multispectral satellite imagery; synthesis pipelines; integrated machine learning;

D O I：

10.1117/12.3023596

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For certain objects, panchromatic or 3-band (RGB) imagery may be insufficient to achieve accurate object identification, thus, additional bandwidths within the infrared (IR) spectrum may be needed to exploit unique spectral characteristics for improving object detection. Many of the existing generative modeling techniques are applied solely to the visible wavelengths. A need exists to fully explore the application of generative modeling techniques to multispectral imagery (MSI) and specifically the IR bands. Generative models used for data augmentation for object detection must have sufficient fidelity to avoid generating data that are out of distribution with respect to actual measured data, or that contain systemic bias or artifacts. This work demonstrates the utility of a conditionally generative, multi-scale vision transformer that learns the spatial and spectral structures and the interactions between them in order to accurately synthesize near-infrared (NIR) and short-wave infrared (SWIR) data from RGB. This synthesis is performed over a diverse set of target objects observed over multiple seasons, at multiple look angles, over varying terrains, with images sampled globally from multiple satellites. For both training and inference, the model is provided no contextual information or metadata as input. Compared to using RGB alone, the average precision (AP) of an off-the-shelf object detection model (YOLOv5) trained with the additional synthesized IR data improves by up to 48% on a target class that is difficult for an analyst to identify. In conjunction with RGB data, using synthetic instead of true IR data for object detection provides higher AP values over all target classes.

引用

页数：18

共 19 条

[1]

Islam MA, 2020, Arxiv, DOI arXiv:2001.08248

[2]

Aouayeb M, 2021, Arxiv, DOI arXiv:2107.03107

[3] Functional Map of the World [J].

Christie, Gordon ;

Fendley, Neil ;

Wilson, James ;

Mukherjee, Ryan .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6172-6180

[4]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[5] Masked Autoencoders Are Scalable Vision Learners [J].

He, Kaiming ;

Chen, Xinlei ;

Xie, Saining ;

Li, Yanghao ;

Dollar, Piotr ;

Girshick, Ross .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988

[6]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[7]

Hudson D. A., 2021, arXiv

[8]

Jocher Glenn, 2021, Zenodo, DOI 10.5281/ZENODO.4418161

[9] A Style-Based Generator Architecture for Generative Adversarial Networks [J].

Karras, Tero ;

Laine, Samuli ;

Aila, Timo .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4396-4405

[10] Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis [J].

Mao, Qi ;

Lee, Hsin-Ying ;

Tseng, Hung-Yu ;

Ma, Siwei ;

Yang, Ming-Hsuan .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1429-1437

← 1 2 →