SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [31] Fine-grained bird image classification based on counterfactual method of vision transformer model
    Tianhua Chen
    Yanyue Li
    Qinghua Qiao
    The Journal of Supercomputing, 2024, 80 : 6221 - 6239
  • [32] High accuracy food image classification via vision transformer with data augmentation and feature augmentation
    Gao, Xinle
    Xiao, Zhiyong
    Deng, Zhaohong
    JOURNAL OF FOOD ENGINEERING, 2024, 365
  • [33] Fine-grained bird image classification based on counterfactual method of vision transformer model
    Chen, Tianhua
    Li, Yanyue
    Qiao, Qinghua
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (05) : 6221 - 6239
  • [34] MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification
    Yu, Shuang
    Ma, Kai
    Bi, Qi
    Bian, Cheng
    Ning, Munan
    He, Nanjun
    Li, Yuexiang
    Liu, Hanruo
    Zheng, Yefeng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 45 - 54
  • [35] Detection and Classification of Mental Stress Using In-Ear Plethysmography and a Vision Transformer
    Barki, Hika
    Nkenyereye, Lionel
    Chung, Wan-Young
    IEEE SENSORS JOURNAL, 2025, 25 (02) : 4015 - 4027
  • [36] Music-evoked emotions classification using vision transformer in EEG signals
    Wang, Dong
    Lian, Jian
    Cheng, Hebin
    Zhou, Yanan
    FRONTIERS IN PSYCHOLOGY, 2024, 15
  • [37] ViTFSL-Baseline: A Simple Baseline of Vision Transformer Network for Few-Shot Image Classification
    Wang, Guangpeng
    Wang, Yongxiong
    Pan, Zhiqun
    Wang, Xiaoming
    Zhang, Jiapeng
    Pan, Jiayun
    IEEE ACCESS, 2024, 12 : 11836 - 11849
  • [38] Enhancing sugarcane leaf disease classification through a novel hybrid shifted-vision transformer approach: technical insights and methodological advancements
    Kuppusamy, Abirami
    Sundaresan, Srinivasan Kandasamy
    Cingaram, Ravichandran
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2024, 197 (01)
  • [39] Durian Disease Classification using Vision Transformer for Cutting-Edge Disease Control
    Daud, Marizuana Mat
    Abualqumssan, Abdelrahman
    Rashid, Fadilla 'Atyka Nor
    Saad, Mohamad Hanif Md
    Zaki, Wan Mimi Diyana Wan
    Satar, Nurhizam Safie Mohd
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (12) : 446 - 452
  • [40] Automated Ischemic Stroke Classification from MRI Scans: Using a Vision Transformer Approach
    Abbaoui, Wafae
    Retal, Sara
    Ziti, Soumia
    El Bhiri, Brahim
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (08)