MHA-WoML: Multi-head attention and Wasserstein-OT for few-shot learning

被引：2

作者：

Yang, Junyan ^{[1
]}

Jiang, Jie ^{[1
]}

Guo, Yanming ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Syst Engn, Changsha, Hunan, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL | 2022年 / 11卷 / 04期

关键词：

Few-shot learning; Hierarchical multi-head attention; Wasserstein distance; Optimal transport theory;

D O I：

10.1007/s13735-022-00254-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-shot learning aims to classify novel classes with extreme few labeled samples. Existing metric-learning-based approaches tend to employ the off-the-shelf CNN models for feature extraction, and conventional clustering algorithms for feature matching. These methods neglect the importance of image regions and might trap in over-fitting problems during feature clustering. In this work, we propose a novel MHA-WoML framework for few-shot learning, which adaptively focuses on semantically dominant regions, and well relieves the over-fitting problem. Specifically, we first design a hierarchical multi-head attention (MHA) module, which consists of three functional heads (i.e., rare head, syntactic head and positional head) with masks, to extract comprehensive image features, and screen out invalid features. The MHA behaves better than current transformers in few-shot recognition. Then, we incorporate the optimal transport theory into Wasserstein distance and propose a Wasserstein-OT metric learning (WoML) module for category clustering. The WoML module focuses more on calculating the appropriately approximate barycenter to avoid the over accurate sub-stage fitting which may threaten the global fitting, thus alleviating the problem of over-fitting in the training process. Experimental results show that our approach achieves remarkably better performance compared to current state-of-the-art methods by scoring about 3% higher accuracy, across four benchmark datasets including MiniImageNet, TieredImageNet, CIFAR-FS and CUB200.

引用

页码：681 / 694

页数：14

共 58 条

[1] Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection
Barz, Bjorn
Rodner, Erik
Garcia, Yanira Guanche
Denzler, Joachim
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (05) : 1088 - 1101
[2] Enhancing Few-Shot Image Classification with Unlabelled Examples
Bateni, Peyman
Barber, Jarred
van de Meent, Jan-Willem
Wood, Frank
[J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1597 - 1606
[3] Bateni P, 2020, PROC CVPR IEEE, P14481, DOI 10.1109/CVPR42600.2020.01450
[4] Bendou Y, 2021, EASY ENSEMBLE AUGMEN
[5] Bertinetto L., 2019, 7 INT C LEARN REPR I
[6] Boudiaf M., 2020, PROC INT C NEURAL IN
[7] Sinkhorn-Knopp theorem for rectangular positive maps
Cariello, D.
[J]. LINEAR & MULTILINEAR ALGEBRA, 2019, 67 (11) : 2345 - 2365
[8] SELF-SUPERVISED LEARNING FOR FEW-SHOT IMAGE CLASSIFICATION
Chen, Da
Chen, Yuefeng
Li, Yuhong
Mao, Feng
He, Yuan
Xue, Hui
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1745 - 1749
[9] Learning a similarity metric discriminatively, with application to face verification
Chopra, S
Hadsell, R
LeCun, Y
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
[10] Generalized Monge-Ampere Capacities
Di Nezza, Eleonora
Lu, Chinh H.
[J]. INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2015, 2015 (16) : 7287 - 7322

← 1 2 3 4 5 6 →