PSVMA plus : Exploring Multi-Granularity Semantic-Visual Adaption for Generalized Zero-Shot Learning

被引:0
|
作者
Liu, Man [1 ]
Bai, Huihui [1 ]
Li, Feng [2 ]
Zhang, Chunjie [1 ]
Wei, Yunchao [1 ]
Wang, Meng [2 ]
Chua, Tat-Seng [3 ]
Zhao, Yao [1 ]
机构
[1] Beijing Jiaotong Univ, Inst Informat Sci, Beijing Key Lab Adv Informat Sci & Network Technol, Beijing 100082, Peoples R China
[2] Hefei Univ Technol, Hefei 230002, Peoples R China
[3] Natl Univ Singapore, Singapore 119077, Singapore
基金
北京市自然科学基金; 国家重点研发计划; 中国国家自然科学基金;
关键词
multi-granularity; Zero-shot learning; semantic-visual interactions;
D O I
10.1109/TPAMI.2024.3467229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features. However, GZSL suffers from insufficient visual-semantic correspondences due to the attribute diversity and instance diversity. Attribute diversity refers to varying semantic granularity in attribute descriptions, ranging from low-level (specific, directly observable) to high-level (abstract, highly generic) characteristics. This diversity challenges the collection of adequate visual cues for attributes under a uni-granularity. Additionally, diverse visual instances corresponding to the same sharing attributes introduce semantic ambiguity, leading to vague visual patterns. To tackle these problems, we propose a multi-granularity progressive semantic-visual mutual adaption (PSVMA+) network, where sufficient visual elements across granularity levels can be gathered to remedy the granularity inconsistency. PSVMA+ explores semantic-visual interactions at different granularity levels, enabling awareness of multi-granularity in both visual and semantic elements. At each granularity level, the dual semantic-visual transformer module (DSVTM) recasts the sharing attributes into instance-centric attributes and aggregates the semantic-related visual regions, thereby learning unambiguous visual features to accommodate various instances. Given the diverse contributions of different granularities, PSVMA+ employs selective cross-granularity learning to leverage knowledge from reliable granularities and adaptively fuses multi-granularity features for comprehensive representations. Experimental results demonstrate that PSVMA+ consistently outperforms state-of-the-art methods.
引用
收藏
页码:51 / 66
页数:16
相关论文
共 50 条
  • [21] Learning semantic consistency for audio-visual zero-shot learning
    Xiaoyong Li
    Jing Yang
    Yuling Chen
    Wei Zhang
    Xiaoli Ruan
    Chengjiang Li
    Zhidong Su
    Artificial Intelligence Review, 58 (7)
  • [22] Transductive Visual-Semantic Embedding for Zero-shot Learning
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shao, Jie
    Huang, Zi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 41 - 49
  • [23] Indirect visual-semantic alignment for generalized zero-shot recognition
    Chen, Yan-He
    Yeh, Mei-Chen
    MULTIMEDIA SYSTEMS, 2024, 30 (02)
  • [24] Rethinking semantic-visual alignment in zero-shot object detection via a softplus margin focal loss
    Li, Qianzhong
    Zhang, Yujia
    Sun, Shiying
    Zhao, Xiaoguang
    Li, Kang
    Tan, Min
    NEUROCOMPUTING, 2021, 449 (449) : 117 - 135
  • [25] VS-Boost: Boosting Visual-Semantic Association for Generalized Zero-Shot Learning
    Li, Xiaofan
    Zhang, Yachao
    Bian, Shiran
    Qu, Yanyun
    Xie, Yuan
    Shi, Zhongchao
    Fan, Jianping
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1107 - 1115
  • [26] Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning
    Chandhok, Shivam
    Balasubramanian, Vineeth N.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3099 - 3107
  • [27] Contrastive visual feature filtering for generalized zero-shot learning
    Meng, Shixuan
    Jiang, Rongxin
    Tian, Xiang
    Zhou, Fan
    Chen, Yaowu
    Liu, Junjie
    Shen, Chen
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [28] Contrastive semantic disentanglement in latent space for generalized zero-shot learning
    Fan, Wentao
    Liang, Chen
    Wang, Tian
    KNOWLEDGE-BASED SYSTEMS, 2022, 257
  • [29] Contrastive semantic disentanglement in latent space for generalized zero-shot learning
    Fan, Wentao
    Liang, Chen
    Wang, Tian
    Knowledge-Based Systems, 2022, 257
  • [30] SVDML: Semantic and Visual Space Deep Mutual Learning for Zero-Shot Learning
    Lu, Nannan
    Luo, Yi
    Qiu, Mingkai
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 383 - 395