SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [21] Efficient vision transformer: application of data-efficient image transformer for aero engine bearing fault classification
    Xin Deng
    XuBing Fang
    GangJin Huang
    JunHeng Fu
    Signal, Image and Video Processing, 2025, 19 (7)
  • [22] IEViT: An enhanced vision transformer architecture for chest X-ray image classification
    Okolo, Gabriel Iluebe
    Katsigiannis, Stamos
    Ramzan, Naeem
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
  • [23] Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer
    Song, Bofan
    Raj, Dharma K. C.
    Yang, Rubin Yuchan
    Li, Shaobai
    Zhang, Chicheng
    Liang, Rongguang
    CANCERS, 2024, 16 (05)
  • [24] Hyperspectral Image Classification Based on Multi-stage Vision Transformer with Stacked Samples
    Chen, Xiaoyue
    Kamata, Sei-Ichiro
    Zhou, Weilian
    2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 441 - 446
  • [25] The Diagnostic Classification of the Pathological Image Using Computer Vision
    Matsuzaka, Yasunari
    Yashiro, Ryu
    ALGORITHMS, 2025, 18 (02)
  • [26] High Resolution SAR Image Classification Using Global-Local Network Structure Based on Vision Transformer and CNN
    Liu, Xingyu
    Wu, Yan
    Liang, Wenkai
    Cao, Yice
    Li, Ming
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [27] Image recoloring for color vision deficiency compensation using Swin transformer
    Ligeng Chen
    Zhenyang Zhu
    Wangkang Huang
    Kentaro Go
    Xiaodiao Chen
    Xiaoyang Mao
    Neural Computing and Applications, 2024, 36 : 6051 - 6066
  • [28] Image recoloring for color vision deficiency compensation using Swin transformer
    Chen, Ligeng
    Zhu, Zhenyang
    Huang, Wangkang
    Go, Kentaro
    Chen, Xiaodiao
    Mao, Xiaoyang
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (11) : 6051 - 6066
  • [29] Image Classification using Shifted Legendre-Fourier Moments and Deep Learning
    Machhour, Abderrahmane
    El Mallahi, Mostafa
    Zouhri, Amal
    Chenouni, Driss
    2019 7TH MEDITERRANEAN CONGRESS OF TELECOMMUNICATIONS (CMT 2019), 2019,
  • [30] A Fast Inference Vision Transformer for Automatic Pavement Image Classification and Its Visual Interpretation Method
    Chen, Yihan
    Gu, Xingyu
    Liu, Zhen
    Liang, Jia
    REMOTE SENSING, 2022, 14 (08)