Dynamic visual-guided selection for zero-shot learning

被引:2
作者
Zhou, Yuan [1 ]
Xiang, Lei [1 ]
Liu, Fan [1 ]
Duan, Haoran [2 ]
Long, Yang [2 ]
机构
[1] Nanjing Univ Informat Sci & Technol, Sch Artificial Intelligence, Nanjing 210044, Jiangsu, Peoples R China
[2] Univ Durham, Dept Comp Sci, Durham, England
关键词
Visual-guided selection; Class prototype refinement; Task-relevant regions; Zero-shot learning;
D O I
10.1007/s11227-023-05625-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Zero-shot learning (ZSL) methods currently employed to identify seen or unseen classes rely on semantic attribute prototypes or class information. However, hand-annotated attributes are only for the category rather than for each image belonging to that category. Furthermore, attribute information is inconsistent across different images of the same category due to variant views. Therefore, we propose a dynamic visual-guided selection (DVGS) which helps dynamically focus on different regions and refines class prototype on each image. Instead of directly aligning an image's global feature with its semantic class vector or its local features with all attribute vectors, the proposed method learns a vision-guided soft mask to refine the class prototype for each image. Additionally, it discovers the most task-relevant regions for fine-grained recognition with the refined class prototype. Extensive experiments on three benchmarks verify the effectiveness of our DVGS and achieve the new state-of-the-art. Our DVGS achieved the best results on fine-grained datasets within both the conventional zero-shot learning (CZSL) and generalized zero-shot learning (GZSL) settings. In particular, on the SUN dataset, our DVGS demonstrates a significant superiority of 10.2% in the CZSL setting compared with the second-best approach. Similarly, our method outperforms the second-best method by an average of 4% on CUB in both the CZSL and GZSL settings. Despite securing the second-best result on the AWA2 dataset, DVGS remains closely competitive, trailing the best performance by a mere 3.4% in CZSL and 1.2% in GZSL.
引用
收藏
页码:4401 / 4419
页数:19
相关论文
共 48 条
[21]  
Kingma D. P., 2013, ARXIV
[22]  
Lampert CH, 2009, PROC CVPR IEEE, P951, DOI 10.1109/CVPRW.2009.5206594
[23]  
Li XY, 2021, AAAI CONF ARTIF INTE, V35, P1966
[24]   Progressive Semantic-Visual Mutual Adaption for Generalized Zero-Shot Learning [J].
Liu, Man ;
Li, Feng ;
Zhang, Chunjie ;
Wei, Yunchao ;
Bai, Huihui ;
Zhao, Yao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :15337-15346
[25]   Goal-Oriented Gaze Estimation for Zero-Shot Learning [J].
Liu, Yang ;
Zhou, Lei ;
Bai, Xiao ;
Huang, Yifei ;
Gu, Lin ;
Zhou, Jun ;
Harada, Tatsuya .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3793-3802
[26]  
Mikolov T., 2013, COMPUT SCI, DOI DOI 10.48550/ARXIV.1301.3781
[27]  
Naeem MF, 2022, ARXIV
[28]   I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification [J].
Naeem, Muhammad Ferjad ;
Khan, Muhammad Gul Zain Ali ;
Xian, Yongqin ;
Afzal, Muhammad Zeshan ;
Stricker, Didier ;
Van Gool, Luc ;
Tombari, Federico .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :15169-15179
[29]  
Palatucci Mark, 2009, PROC NEURIPS, V22, P2
[30]  
Patterson G, 2012, PROC CVPR IEEE, P2751, DOI 10.1109/CVPR.2012.6247998