SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [1] Hierarchical Pretrained Backbone Vision Transformer for Image Classification in Histopathology
    Zedda, Luca
    Loddo, Andrea
    Di Ruberto, Cecilia
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 223 - 234
  • [2] FishAI: Automated hierarchical marine fish image classification with vision transformer
    Yang, Chenghan
    Zhou, Peng
    Wang, Chun-Sheng
    Fu, Ge-Yi
    Xu, Xue-Wei
    Niu, Zhibin
    Zhu, Lin
    Yuan, Ye
    Shen, Hong-Bin
    Pan, Xiaoyong
    ENGINEERING REPORTS, 2024, 6 (12)
  • [3] Shifted Window Vision Transformer for Blood Cell Classification
    Chen, Shuwen
    Lu, Siyuan
    Wang, Shuihua
    Ni, Yiyang
    Zhang, Yudong
    ELECTRONICS, 2023, 12 (11)
  • [4] CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification
    He, Wenxuan
    Huang, Weiliang
    Liao, Shuhong
    Xu, Zhen
    Yan, Jingwen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9266 - 9277
  • [5] Compressed-Domain Vision Transformer for Image Classification
    Ji, Ruolei
    Karam, Lina J.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 299 - 310
  • [6] Diabetic Retinopathy Classification using Vision Transformer
    Mutawa, A. M.
    Sruthi, Sai
    2022 6TH EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING & COMPUTER SCIENCE, ELECS, 2022, : 25 - 30
  • [7] SPT-Swin: A Shifted Patch Tokenization Swin Transformer for Image Classification
    Ferdous, Gazi Jannatul
    Sathi, Khaleda Akhter
    Hossain, Md. Azad
    Dewan, M. Ali Akber
    IEEE ACCESS, 2024, 12 : 117617 - 117626
  • [8] Hint-Based Image Colorization Based on Hierarchical Vision Transformer
    Lee, Subin
    Jung, Yong Ju
    SENSORS, 2022, 22 (19)
  • [9] Vision Transformer with window sequence merging mechanism for image classification
    Jiao, Erjie
    Leng, Qiangkui
    Guo, Jiamei
    Meng, Xiangfu
    Wang, Changzhong
    APPLIED SOFT COMPUTING, 2025, 171
  • [10] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606