Dual Guidance Enabled Fuzzy Inference for Enhanced Fine-Grained Recognition

被引:13
作者
Chen, Qiupu [1 ]
He, Feng [1 ]
Wang, Gang [2 ,3 ]
Bai, Xiao [4 ]
Cheng, Long [5 ]
Ning, Xin [6 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] NingboTech Univ, Sch Comp & Data Engn, Ningbo 315100, Peoples R China
[3] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England
[4] Beihang Univ, Beijing 100191, Peoples R China
[5] North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China
[6] Chinese Acad Sci, Inst Semicond, Beijing 100045, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金; 中国博士后科学基金;
关键词
Task analysis; Fuzzy logic; Feature extraction; Transformers; Accuracy; Decision making; Semantics; Fine-grained image recognition; fuzzy inference system (FIS); multiscale; prediction; vision transformer (ViT);
D O I
10.1109/TFUZZ.2024.3427654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of fine-grained visual recognition (FGVR), the ability to resolve minute and often subtle differences between highly similar object categories is paramount. The advent of vision transformers (ViTs) has marked a significant advancement in this domain, primarily due to their capacity to model the intricate interdependencies among object parts represented as image patches. However, their inherent single-scale processing limitation hampers their effectiveness in FGVR tasks. Furthermore, the challenge of uncertainty inherent in FGVR tasks remains unresolved, necessitating the development of methods that bolster the robustness of these models, particularly across varying scales of visual features. We introduce a new plug-in module that can be seamlessly integrated into ViT, called dual guidance enabled fuzzy inference (DGEFI), which combines fuzzy inference with dual guidance mechanisms. Dual guidance includes scale-aware guidance and probability guidance. The former strengthens the model's focus on salient scales, and the latter refines the distinction between similar categories by optimizing intraclass compactness and interclass separability. Fuzzy inference enables the model to adaptively tweak the influence of distinct scales in the final decision-making phase, thereby enhancing the overall accuracy of recognition tasks. We demonstrate the versatility and efficacy of our DGEFI module by integrating it into several leading ViT backbones, including ViT, Swin, Mvitv2, and EVA-02. Empirical results exhibit exceptional performance gains, with the integration of DGEFI into EVA-02 remarkable accuracy improvements, reaching 93.6% on the CUB-200-2011 dataset and 94.5% on the NA-Birds dataset, respectively, improving over the state-of-the-art method 0.5% and 1.5%.
引用
收藏
页码:418 / 430
页数:13
相关论文
共 56 条
[1]  
Behera A, 2021, AAAI CONF ARTIF INTE, V35, P929
[2]   SR-GNN: Spatial Relation-Aware Graph Neural Network for Fine-Grained Image Categorization [J].
Bera, Asish ;
Wharton, Zachary ;
Liu, Yonghuai ;
Bessis, Nik ;
Behera, Ardhendu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :6017-6031
[3]   Fuzzy inference system with interpretable fuzzy rules: Advancing explainable artificial intelligence for disease diagnosis-A comprehensive review [J].
Cao, Jin ;
Zhou, Ta ;
Zhi, Shaohua ;
Lam, Saikit ;
Ren, Ge ;
Zhang, Yuanpeng ;
Wang, Yongqiang ;
Dong, Yanjing ;
Cai, Jing .
INFORMATION SCIENCES, 2024, 662
[4]   CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification [J].
Chen, Chun-Fu ;
Fan, Quanfu ;
Panda, Rameswar .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :347-356
[5]  
Chou PY, 2023, Arxiv, DOI [arXiv:2303.06442, DOI 10.48550/ARXIV.2303.06442]
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[8]   Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification [J].
Du, Ruoyi ;
Xie, Jiyang ;
Ma, Zhanyu ;
Chang, Dongliang ;
Song, Yi-Zhe ;
Guo, Jun .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :9521-9535
[9]   Fine-Grained Visual Classification via Progressive Multi-granularity Training of Jigsaw Patches [J].
Du, Ruoyi ;
Chang, Dongliang ;
Bhunia, Ayan Kumar ;
Xie, Jiyang ;
Ma, Zhanyu ;
Song, Yi-Zhe ;
Guo, Jun .
COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :153-168
[10]   Multiscale Vision Transformers [J].
Fan, Haoqi ;
Xiong, Bo ;
Mangalam, Karttikeya ;
Li, Yanghao ;
Yan, Zhicheng ;
Malik, Jitendra ;
Feichtenhofer, Christoph .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6804-6815