Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers

被引:0
作者
Pan, Haowen [1 ]
Cao, Yixin [2 ]
Wang, Xiaozhi [3 ]
Yang, Xun [1 ]
Wang, Meng [4 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
[3] Tsinghua Univ, Beijing, Peoples R China
[4] Hefei Univ Technol, Hefei, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding the internal mechanisms by which multi-modal large language models (LLMs) interpret different modalities and integrate cross-modal representations is becoming increasingly critical for continuous improvements in both academia and industry. In this paper, we propose a novel method to identify key neurons for interpretability- how multi-modal LLMs bridge visual and textual concepts for captioning. Our method improves conventional works upon efficiency and applied range by removing needs of costly gradient computation. Based on those identified neurons, we further design a multi-modal knowledge editing method, beneficial to mitigate sensitive words or hallucination. For rationale of our design, we provide theoretical assumption. For empirical evaluation, we have conducted extensive quantitative and qualitative experiments. The results not only validate the effectiveness of our methods, but also offer insightful findings that highlight three key properties of multi-modal neurons: sensitivity, specificity and causal-effect, to shed light for future research.(1)
引用
收藏
页码:1012 / 1037
页数:26
相关论文
共 49 条
[1]   Network Dissection: Quantifying Interpretability of Deep Visual Representations [J].
Bau, David ;
Zhou, Bolei ;
Khosla, Aditya ;
Oliva, Aude ;
Torralba, Antonio .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3319-3327
[2]  
Chiang Wei-Lin, 2023, Vicuna: An Open -Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality
[3]  
Clark Kevin, 2019, P 2019 ACL WORKSHOP
[4]  
Dai DM, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P8493
[5]  
Dai Wenliang, 2023, Instructblip: Towards general-purpose vision-language models with instruction tuning
[6]  
Di DL, 2025, Arxiv, DOI arXiv:2403.09236
[7]  
Duan JH, 2024, Arxiv, DOI arXiv:2307.01379
[8]  
Peters ME, 2018, Arxiv, DOI [arXiv:1808.08949, DOI 10.48550/ARXIV.1808.08949]
[9]   EVA: Exploring the Limits of Masked Visual Representation Learning at Scale [J].
Fang, Yuxin ;
Wang, Wen ;
Xie, Binhui ;
Sun, Quan ;
Wu, Ledell ;
Wang, Xinggang ;
Huang, Tiejun ;
Wang, Xinlong ;
Cao, Yue .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :19358-19369
[10]  
Geng Xinyang, 2023, Blog post