Explainability Enhanced Object Detection Transformer With Feature Disentanglement

被引:0
作者
Yu, Wenlong [1 ,2 ]
Liu, Ruonan [1 ,2 ]
Chen, Dongyue [1 ,2 ]
Hu, Qinghua [1 ,2 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300350, Peoples R China
[2] Tianjin Univ, Tianjin Key Lab Machine Learning, Tianjin 300350, Peoples R China
关键词
Feature extraction; Transformers; Object detection; Mathematical models; Computational modeling; Analytical models; Visualization; Semantics; Deep learning; Vectors; explainability; feature disentanglement; hybrid transformer model; object detection; REPRESENTATION; MODELS;
D O I
10.1109/TIP.2024.3492733
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Explainability is a pivotal factor in determining whether a deep learning model can be authorized in critical applications. To enhance the explainability of models of end-to-end object DEtection with TRansformer (DETR), we introduce a disentanglement method that constrains the feature learning process, following a divide-and-conquer decoupling paradigm, similar to how people understand complex real-world problems. We first demonstrate the entangled property of the features between the extractor and detector and find that the regression function is a key factor contributing to the deterioration of disentangled feature activation. These highly entangled features always activate the local characteristics, making it difficult to cover the semantic information of an object, which also reduces the interpretability of single-backbone object detection models. Thus, an Explainability Enhanced object detection Transformer with feature Disentanglement (DETD) model is proposed, in which the Tensor Singular Value Decomposition (T-SVD) is used to produce feature bases and the Batch averaged Feature Spectral Penalization (BFSP) loss is introduced to constrain the disentanglement of the feature and balance the semantic activation. The proposed method is applied across three prominent backbones, two DETR variants, and a CNN based model. By combining two optimization techniques, extensive experiments on two datasets consistently demonstrate that the DETD model outperforms the counterpart in terms of object detection performance and feature disentanglement. The Grad-CAM visualizations demonstrate the enhancement of feature learning explainability in the disentanglement view.
引用
收藏
页码:6439 / 6454
页数:16
相关论文
共 66 条
  • [1] Abnar S, 2020, Arxiv, DOI arXiv:2005.00928
  • [2] Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions
    Atakishiyev, Shahin
    Salameh, Mohammad
    Yao, Hengshuai
    Goebel, Randy
    [J]. IEEE ACCESS, 2024, 12 : 101603 - 101625
  • [3] Bar A, 2022, Arxiv, DOI arXiv:2106.04550
  • [4] Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers
    Binder, Alexander
    Montavon, Gregoire
    Lapuschkin, Sebastian
    Mueller, Klaus-Robert
    Samek, Wojciech
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT II, 2016, 9887 : 63 - 71
  • [5] Carbonneau MA, 2022, Arxiv, DOI arXiv:2012.09276
  • [6] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [7] Transformer Interpretability Beyond Attention Visualization
    Chefer, Hila
    Gur, Shir
    Wolf, Lior
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 782 - 791
  • [8] Chen JB, 2018, PR MACH LEARN RES, V80
  • [9] Chen XY, 2019, PR MACH LEARN RES, V97
  • [10] Deeply Explain CNN Via Hierarchical Decomposition
    Cheng, Ming-Ming
    Jiang, Peng-Tao
    Han, Ling-Hao
    Wang, Liang
    Torr, Philip
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (05) : 1091 - 1105