MHA-WoML: Multi-head attention and Wasserstein-OT for few-shot learning

被引:2
作者
Yang, Junyan [1 ]
Jiang, Jie [1 ]
Guo, Yanming [1 ]
机构
[1] Natl Univ Def Technol, Coll Syst Engn, Changsha, Hunan, Peoples R China
关键词
Few-shot learning; Hierarchical multi-head attention; Wasserstein distance; Optimal transport theory;
D O I
10.1007/s13735-022-00254-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-shot learning aims to classify novel classes with extreme few labeled samples. Existing metric-learning-based approaches tend to employ the off-the-shelf CNN models for feature extraction, and conventional clustering algorithms for feature matching. These methods neglect the importance of image regions and might trap in over-fitting problems during feature clustering. In this work, we propose a novel MHA-WoML framework for few-shot learning, which adaptively focuses on semantically dominant regions, and well relieves the over-fitting problem. Specifically, we first design a hierarchical multi-head attention (MHA) module, which consists of three functional heads (i.e., rare head, syntactic head and positional head) with masks, to extract comprehensive image features, and screen out invalid features. The MHA behaves better than current transformers in few-shot recognition. Then, we incorporate the optimal transport theory into Wasserstein distance and propose a Wasserstein-OT metric learning (WoML) module for category clustering. The WoML module focuses more on calculating the appropriately approximate barycenter to avoid the over accurate sub-stage fitting which may threaten the global fitting, thus alleviating the problem of over-fitting in the training process. Experimental results show that our approach achieves remarkably better performance compared to current state-of-the-art methods by scoring about 3% higher accuracy, across four benchmark datasets including MiniImageNet, TieredImageNet, CIFAR-FS and CUB200.
引用
收藏
页码:681 / 694
页数:14
相关论文
共 58 条
  • [1] Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection
    Barz, Bjorn
    Rodner, Erik
    Garcia, Yanira Guanche
    Denzler, Joachim
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (05) : 1088 - 1101
  • [2] Enhancing Few-Shot Image Classification with Unlabelled Examples
    Bateni, Peyman
    Barber, Jarred
    van de Meent, Jan-Willem
    Wood, Frank
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1597 - 1606
  • [3] Bateni P, 2020, PROC CVPR IEEE, P14481, DOI 10.1109/CVPR42600.2020.01450
  • [4] Bendou Y, 2021, EASY ENSEMBLE AUGMEN
  • [5] Bertinetto L., 2019, 7 INT C LEARN REPR I
  • [6] Boudiaf M., 2020, PROC INT C NEURAL IN
  • [7] Sinkhorn-Knopp theorem for rectangular positive maps
    Cariello, D.
    [J]. LINEAR & MULTILINEAR ALGEBRA, 2019, 67 (11) : 2345 - 2365
  • [8] SELF-SUPERVISED LEARNING FOR FEW-SHOT IMAGE CLASSIFICATION
    Chen, Da
    Chen, Yuefeng
    Li, Yuhong
    Mao, Feng
    He, Yuan
    Xue, Hui
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1745 - 1749
  • [9] Learning a similarity metric discriminatively, with application to face verification
    Chopra, S
    Hadsell, R
    LeCun, Y
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
  • [10] Generalized Monge-Ampere Capacities
    Di Nezza, Eleonora
    Lu, Chinh H.
    [J]. INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2015, 2015 (16) : 7287 - 7322