Dual Guidance Enabled Fuzzy Inference for Enhanced Fine-Grained Recognition

被引:13
作者
Chen, Qiupu [1 ]
He, Feng [1 ]
Wang, Gang [2 ,3 ]
Bai, Xiao [4 ]
Cheng, Long [5 ]
Ning, Xin [6 ]
机构
[1] Univ Sci & Technol China, Hefei 230026, Peoples R China
[2] NingboTech Univ, Sch Comp & Data Engn, Ningbo 315100, Peoples R China
[3] Imperial Coll London, Dept Bioengn, London SW7 2AZ, England
[4] Beihang Univ, Beijing 100191, Peoples R China
[5] North China Elect Power Univ, Sch Control & Comp Engn, Beijing 102206, Peoples R China
[6] Chinese Acad Sci, Inst Semicond, Beijing 100045, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金; 中国博士后科学基金;
关键词
Task analysis; Fuzzy logic; Feature extraction; Transformers; Accuracy; Decision making; Semantics; Fine-grained image recognition; fuzzy inference system (FIS); multiscale; prediction; vision transformer (ViT);
D O I
10.1109/TFUZZ.2024.3427654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of fine-grained visual recognition (FGVR), the ability to resolve minute and often subtle differences between highly similar object categories is paramount. The advent of vision transformers (ViTs) has marked a significant advancement in this domain, primarily due to their capacity to model the intricate interdependencies among object parts represented as image patches. However, their inherent single-scale processing limitation hampers their effectiveness in FGVR tasks. Furthermore, the challenge of uncertainty inherent in FGVR tasks remains unresolved, necessitating the development of methods that bolster the robustness of these models, particularly across varying scales of visual features. We introduce a new plug-in module that can be seamlessly integrated into ViT, called dual guidance enabled fuzzy inference (DGEFI), which combines fuzzy inference with dual guidance mechanisms. Dual guidance includes scale-aware guidance and probability guidance. The former strengthens the model's focus on salient scales, and the latter refines the distinction between similar categories by optimizing intraclass compactness and interclass separability. Fuzzy inference enables the model to adaptively tweak the influence of distinct scales in the final decision-making phase, thereby enhancing the overall accuracy of recognition tasks. We demonstrate the versatility and efficacy of our DGEFI module by integrating it into several leading ViT backbones, including ViT, Swin, Mvitv2, and EVA-02. Empirical results exhibit exceptional performance gains, with the integration of DGEFI into EVA-02 remarkable accuracy improvements, reaching 93.6% on the CUB-200-2011 dataset and 94.5% on the NA-Birds dataset, respectively, improving over the state-of-the-art method 0.5% and 1.5%.
引用
收藏
页码:418 / 430
页数:13
相关论文
共 56 条
[31]  
Mei AK, 2025, VISUAL COMPUT, V41, P1873, DOI 10.1007/s00371-024-03502-3
[32]  
Mondal K, 2012, Arxiv, DOI arXiv:1206.3633
[33]  
Moon J, 2023, Arxiv, DOI arXiv:2308.02161
[34]  
Müller R, 2019, ADV NEUR IN, V32
[35]   Medical Diagnosis and Life Span of Sufferer Using Interval Valued Complex Fuzzy Relations [J].
Nasir, Abdul ;
Jan, Naeem ;
Gumaei, Abdu ;
Khan, Sami Ullah .
IEEE ACCESS, 2021, 9 :93764-93780
[36]   Active Learning for Open-set Annotation [J].
Ning, Kun-Peng ;
Zhao, Xun ;
Li, Yu ;
Huang, Sheng-Jun .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :41-49
[37]   ZeRO: Memory Optimizations Toward Training Trillion Parameter Models [J].
Rajbhandari, Samyam ;
Rasley, Jeff ;
Ruwase, Olatunji ;
He, Yuxiong .
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[38]  
Redmon J., 2018, arXiv, DOI DOI 10.48550/ARXIV.1804.02767
[39]  
Sigamani A., 2024, Neutrosophic Sets Syst., V67, P21
[40]   Few-shot defect recognition for the multi-domain industry via attention embedding and fine-grained feature enhancement [J].
Su, Yingtao ;
Yan, Ping ;
Lin, Junyao ;
Wen, Chao ;
Fan, Yong .
KNOWLEDGE-BASED SYSTEMS, 2024, 284