ADA-VIT: ATTENTION-GUIDED DATA AUGMENTATION FOR VISION TRANSFORMERS

被引:1
作者
Baili, Nada [1 ]
Frigui, Hichem [1 ]
机构
[1] Univ Louisville, Comp Sci & Engn Dept, 220 Eastern Pkwy, Louisville, KY 40292 USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年
关键词
Vision Transformer; Data Augmentation;
D O I
10.1109/ICIP49359.2023.10222908
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The limitations of a machine learning model can often be traced back to the existence of under-represented regions in the feature space of the training data. Data augmentation is a common technique that has been used to inflate training datasets with new samples to improve the model performance. However, these techniques usually focus on expanding the data in size and do not necessarily aim to cover the under-represented regions of the feature space. In this paper, we propose an Attention-guided Data Augmentation technique for Vision Transformers (ADA-ViT). Our framework exploits the attention mechanism in vision transformers to extract visual concepts related to misclassified samples. The retrieved concepts describe under-represented regions in the training dataset that contributed to the misclassifications. We leverage this information to guide our data augmentation process by identifying new samples and using them to augment the training data. We hypothesize that this focused data augmentation populates under-represented regions and improves the model's accuracy. We evaluate our framework on the CUB dataset and CUB-Families. Our experiments show that ADA-ViT outperforms state-of-the-art data augmentation strategies, and can improve the accuracy of a transformer by an average margin of 2.5% on the CUB dataset and 3.3% on CUB-Families.
引用
收藏
页码:385 / 389
页数:5
相关论文
共 16 条
[1]  
allaboutbirds, Online bird guide
[2]   Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding [J].
Chen, Tianshui ;
Wu, Wenxi ;
Gao, Yuefang ;
Dong, Le ;
Luo, Xiaonan ;
Lin, Liang .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :2023-2031
[3]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[4]  
Dosovitskiy A., 2020, ICLR 2021
[5]  
Hu T, 2019, Arxiv, DOI [arXiv:1901.09891, DOI 10.48550/ARXIV.1901.09891]
[6]  
Huang SL, 2021, AAAI CONF ARTIF INTE, V35, P1628
[7]   Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [J].
Ren, Shaoqing ;
He, Kaiming ;
Girshick, Ross ;
Sun, Jian .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (06) :1137-1149
[8]  
Ridnik T., 2021, 35 C NEUR INF PROC S
[9]   Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization [J].
Selvaraju, Ramprasaath R. ;
Cogswell, Michael ;
Das, Abhishek ;
Vedantam, Ramakrishna ;
Parikh, Devi ;
Batra, Dhruv .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :618-626
[10]   A survey on Image Data Augmentation for Deep Learning [J].
Shorten, Connor ;
Khoshgoftaar, Taghi M. .
JOURNAL OF BIG DATA, 2019, 6 (01)