Diffusion Models Based Unconditional Counterfactual Explanations Generation

被引:0
作者
Zhong, Zhi [1 ]
Wang, Yu [2 ]
Zhu, Ziye [1 ]
Li, Yun [1 ]
机构
[1] School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing
[2] School of Science, China Pharmaceutical University, Nanjing
来源
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence | 2024年 / 37卷 / 11期
基金
中国国家自然科学基金;
关键词
Adversarial Attack; Counterfactual Explanation; Deep Learning; Diffusion Model; Interpretability;
D O I
10.16451/j.cnki.issnl003-6059.202411006
中图分类号
学科分类号
摘要
Counterfactual explanations alter the model output by implementing minimal and interpretable modifications to input data, revealing key factors influencing model decisions. Existing counterfactual explanation methods based on diffusion models rely on conditional generation, requiring additional semantic information related to classification. However, ensuring semantic quality of the semantic information is challenging and computational costs are increased. To address these issues, an unconditional counterfactual explanation generation method based on the denoising diffusion implicit model(DDIM)is proposed. By leveraging the consistency exhibited by DDIM during the reverse denoising process, noisy images are treated as latent variables to control the generated outputs, thus making the diffusion model suitable for unconditional counterfactual explanation generation workflows. Then, the advantages of DDIM in filtering high-frequency noise and out-of-distribution perturbations are fully utilized, thereby reconstructing the unconditional counterfactual explanation workflow to generate semantically interpretable modifications. Extensive experiments on different datasets demonstrate that the proposed method achieves superior results across multiple metrics. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1010 / 1021
页数:11
相关论文
共 51 条
  • [1] CHANG C H, CREAGER E, GOLDENBERG A, Et al., Explaining Image Classifiers by Counterfactual Generation
  • [2] VERMA S, BOONSANONG V, HOANG M, Et al., Counterfactual Explanations for Machine Learning: A Review
  • [3] WACHTER S, MITTELSTADT B, RUSSELL C., Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR [ C / OL ]
  • [4] GOODFELLOW I J, SHLENS J, SZEGEDY C., Explaining and Harnessing Adversarial Examples
  • [5] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, Et al., Generative Adversarial Nets, Proc of the 27th International Conference on Neural Information Processing Systems, II, pp. 2672-2680, (2014)
  • [6] KINGMA D P, WELLING M., Auto-Encoding Variational Bayes
  • [7] HO J, JAIN A, ABBEEL P., Denoising Diffusion Probabilistic Models, Proc of the 34th International Conference on Neural Information Processing Systems, pp. 6840-6851, (2020)
  • [8] SONG J M, MENG C L, ERMON S., Denoising Diffusion Implicit Models[ C / OL]
  • [9] B魻HLE M, FRITZ M, SCHIELE B., Convolutional Dynamic Alignment Networks for Interpretable Classifications, Proc of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 10024-10033, (2021)
  • [10] B魻HLE M, FRITZ M, SCHIELE B., B-cos Networks: Alignment Is All We Need for Interpretability, Proc of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, pp. 10319-10328, (2022)