Hierarchical Attention Network for Open-Set Fine-Grained Image Recognition

被引:2
|
作者
Sun, Jiayin [1 ,2 ,3 ]
Wang, Hong [4 ]
Dong, Qiulei [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[4] Univ Chinese Acad Sci, Coll Life Sci, Beijing 100049, Peoples R China
关键词
Transformers; Feature extraction; Task analysis; Image recognition; Training; Visualization; Computer vision; Open-set fine-grained image recognition; hierarchical attention; long-short term memory; TEMPORAL ATTENTION; DIFFICULTY;
D O I
10.1109/TCSVT.2023.3325001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Triggered by the success of transformers in various visual tasks, the spatial self-attention mechanism has recently attracted more and more attention in the computer vision community. However, we empirically found that a typical vision transformer with the spatial self-attention mechanism could not learn accurate attention maps for distinguishing different categories of fine-grained images. To address this problem, motivated by the temporal attention mechanism in brains, we propose a hierarchical attention network for learning fine-grained feature representations, called HAN, where the features learnt by implementing a sequence of spatial self-attention operations corresponding to multiple moments are aggregated progressively. The proposed HAN consists of four modules: a self-attention backbone module for learning a sequence of features with self-attention operations, a spatial feature self-organizing module for facilitating the model training, a hierarchical aggregation module for aggregating the re-organized features via a Long Short-Term Memory network, and a context-aware module that is implemented as the forget block of the hierarchical aggregation module for preserving/forgetting the long-term memory by utilizing contextual information. Then, we propose a HAN-based method for open-set fine-grained recognition by integrating the proposed HAN network with a linear classifier, called HAN-OSFGR. Extensive experimental results on 3 fine-grained datasets and 2 coarse-grained datasets demonstrate that the proposed HAN-OSFGR outperforms 9 state-of-the-art open-set recognition methods significantly in most cases.
引用
收藏
页码:3891 / 3904
页数:14
相关论文
共 50 条
  • [31] Learning to locate for fine-grained image recognition
    Chen, Jiamin
    Hu, Jianguo
    Li, Shiren
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
  • [32] CAMV: Class Activation Mapping Value Towards Open Set Fine-Grained Recognition
    Dai, Wei
    Diao, Wenhui
    Sun, Xian
    Zhang, Yue
    Zhao, Liangjin
    Li, Jun
    Fu, Kun
    IEEE ACCESS, 2021, 9 : 8167 - 8177
  • [33] Towards Fine-Grained Unknown Class Detection Against the Open-Set Attack Spectrum With Variable Legitimate Traffic
    Zhao, Ziming
    Li, Zhaoxuan
    Xie, Xiaofei
    Yu, Jiongchi
    Zhang, Fan
    Zhang, Rui
    Chen, Binbin
    Luo, Xiangyang
    Hu, Ming
    Ma, Wenrui
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (05) : 3945 - 3960
  • [34] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [35] Exploring Rich Semantics for Open-Set Action Recognition
    Hu, Yufan
    Gao, Junyu
    Dong, Jianfeng
    Fan, Bin
    Liu, Hongmin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 5410 - 5421
  • [36] Learning Structured Relation Embeddings for Fine-Grained Fashion Attribute Recognition
    Zhu, Shumin
    Zou, Xingxing
    Qian, Jianjun
    Wong, Wai Keung
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1652 - 1664
  • [37] Part-Guided Relational Transformers for Fine-Grained Visual Recognition
    Zhao, Yifan
    Li, Jia
    Chen, Xiaowu
    Tian, Yonghong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 9470 - 9481
  • [38] Deep LSAC for Fine-Grained Recognition
    Lin, Di
    Wang, Yi
    Liang, Lingyu
    Li, Ping
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 200 - 214
  • [39] TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples
    Huang, Huaxi
    Zhang, Junjie
    Yu, Litao
    Zhang, Jian
    Wu, Qiang
    Xu, Chang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 853 - 866
  • [40] Hierarchical Self-Distilled Feature Learning for Fine-Grained Visual Categorization
    Hu, Yutao
    Jiang, Xiaolong
    Liu, Xuhui
    Luo, Xiaoyan
    Hu, Yao
    Cao, Xianbin
    Zhang, Baochang
    Zhang, Jun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, : 1 - 14