SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [41] Waste classification using vision transformer based on multilayer hybrid convolution neural network
    Alrayes, Fatma S.
    Asiri, Mashael M.
    Maashi, Mashael S.
    Nour, Mohamed K.
    Rizwanullah, Mohammed
    Osman, Azza Elneil
    Drar, Suhanda
    Zamani, Abu Sarwar
    URBAN CLIMATE, 2023, 49
  • [42] Vision Transformer for Parkinson's Disease Classification using Multilingual Sustained Vowel Recordings
    Hemmerling, Daria
    Wodzinski, Marek
    Orozco-Arroyave, Juan Rafael
    Sztaho, David
    Daniol, Mateusz
    Jemiolo, Pawel
    Wojcik-Pedziwiatr, Magdalena
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [43] Recognition of penetration state in GTAW based on vision transformer using weld pool image
    Wang, Zhenmin
    Chen, Haoyu
    Zhong, Qiming
    Lin, Sanbao
    Wu, Jianwen
    Xu, Mengjia
    Zhang, Qin
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 119 (7-8) : 5439 - 5452
  • [44] Recognition of penetration state in GTAW based on vision transformer using weld pool image
    Zhenmin Wang
    Haoyu Chen
    Qiming Zhong
    Sanbao Lin
    Jianwen Wu
    Mengjia Xu
    Qin Zhang
    The International Journal of Advanced Manufacturing Technology, 2022, 119 : 5439 - 5452
  • [45] Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer
    Wang, Qingbin
    Xiong, Yuxuan
    Zhu, Hanfeng
    Mu, Xuefeng
    Zhang, Yan
    Ma, Yutao
    COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 118
  • [46] HCT: image super-resolution restoration using hierarchical convolution transformer networks
    Guo, Ying
    Tian, Chang
    Wang, Han
    Liu, Jie
    Di, Chong
    Ning, Keqing
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [47] Thoracic computed tomography (CT) image-based identification and severity classification of COVID-19 cases using vision transformer (ViT)
    Taye, Gizatie Desalegn
    Sisay, Zewdie Habtie
    Gebeyhu, Genet Worku
    Kidus, Fisha Haileslassie
    DISCOVER APPLIED SCIENCES, 2024, 6 (08)
  • [48] HiViT: Hierarchical attention-based Transformer for multi-scale whole slide histopathological image classification
    Yu, Jinze
    Li, Shuo
    Tan, Luxin
    Zhou, Haoyi
    Li, Zhongwu
    Li, Jianxin
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 277
  • [49] Satellite Images Analysis and Classification using Deep Learning-based Vision Transformer Model
    Adegun, Adekanmi Adeyinka
    Viriri, Serestina
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 1275 - 1279
  • [50] Automated classification of remote sensing satellite images using deep learning based vision transformer
    Adegun, Adekanmi
    Viriri, Serestina
    Tapamo, Jules-Raymond
    APPLIED INTELLIGENCE, 2024, 54 (24) : 13018 - 13037