Aligning Medical Images with General Knowledge from Large Language Models

被引:0
作者
Fang, Xiao [1 ]
Lin, Yi [1 ]
Zhang, Dong [2 ]
Cheng, Kwang-Ting [2 ]
Chen, Hao [1 ,3 ,4 ]
机构
[1] HKUST, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] HKUST, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[3] HKUST, Dept Chem & Biol Engn, Hong Kong, Peoples R China
[4] HKUST Shenzhen Hong Kong Collaborat Innovat Res I, Shenzhen, Peoples R China
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT X | 2024年 / 15010卷
关键词
Prompt Learning; Vision-Language Models; Large Language Model; Medical Image Analysis;
D O I
10.1007/978-3-031-72117-5_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained large vision-language models (VLMs) like CLIP have revolutionized visual representation learning using natural language as supervisions, and demonstrated promising generalization ability. In this work, we propose ViP, a novel visual symptom-guided prompt learning framework for medical image analysis, which facilitates general knowledge transfer from CLIP. ViP consists of two key components: a visual symptom generator (VSG) and a dual-prompt network. Specifically, VSG aims to extract explicable visual symptoms from pre-trained large language models, while the dual-prompt network utilizes these visual symptoms to guide the training on two learnable prompt modules, i.e., context prompt and merge prompt, which effectively adapts our framework to medical image analysis via large VLMs. Extensive experimental results demonstrate that ViP can outperform state-of-the-art methods on two challenging datasets. The code is available at https://github.com/xiaofang007/ViP.
引用
收藏
页码:57 / 67
页数:11
相关论文
共 50 条
  • [21] KoSEL: Knowledge subgraph enhanced large language model for medical question answering
    Zeng, Zefan
    Cheng, Qing
    Hu, Xingchen
    Zhuang, Yan
    Liu, Xinwang
    He, Kunlun
    Liu, Zhong
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [22] Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals
    Kim, Kiduk
    Cho, Kyungjin
    Jang, Ryoungwoo
    Kyung, Sunggu
    Lee, Soyoung
    Ham, Sungwon
    Choi, Edward
    Hong, Gil-Sun
    Kim, Namkug
    KOREAN JOURNAL OF RADIOLOGY, 2024, 25 (03) : 224 - 242
  • [23] KnowledgeNavigator: leveraging large language models for enhanced reasoning over knowledge graph
    Guo, Tiezheng
    Yang, Qingwen
    Wang, Chen
    Liu, Yanyi
    Li, Pan
    Tang, Jiawei
    Li, Dapeng
    Wen, Yingyou
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 7063 - 7076
  • [24] A comprehensive evaluation of large language models in mining gene relations and pathway knowledge
    Azam, Muhammad
    Chen, Yibo
    Arowolo, Micheal Olaolu
    Liu, Haowang
    Popescu, Mihail
    Xu, Dong
    QUANTITATIVE BIOLOGY, 2024, 12 (04) : 360 - 374
  • [25] AIREG: Enhanced Educational Recommender System with Large Language Models and Knowledge Graphs
    Fathi, Fatemeh
    SEMANTIC WEB: ESWC 2024 SATELLITE EVENTS, PT II, 2025, 15345 : 84 - 93
  • [26] Enhancing emergency decision-making with knowledge graphs and large language models
    Chen, Minze
    Tao, Zhenxiang
    Tang, Weitong
    Qin, Tingxin
    Yang, Rui
    Zhu, Chunli
    INTERNATIONAL JOURNAL OF DISASTER RISK REDUCTION, 2024, 113
  • [27] Story-to-Images Translation: Leveraging Diffusion Models and Large Language Models for Sequence Image Generation
    Kumagai, Haruka
    Yamaki, Ryosuke
    Naganuma, Hiroki
    PROCEEDINGS OF THE 2ND WORKSHOP ON USER-CENTRIC NARRATIVE SUMMARIZATION OF LONG VIDEOS, NARSUM 2023, 2023, : 57 - 63
  • [28] Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering
    Hu, Zhongjian
    Yang, Peng
    Liu, Fengyuan
    Meng, Yuan
    Liu, Xingyu
    BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 843 - 857
  • [29] Analyzing evaluation methods for large language models in the medical field: a scoping review
    Lee, Junbok
    Park, Sungkyung
    Shin, Jaeyong
    Cho, Belong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [30] Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
    Xu, Xuenan
    Zhang, Pingyue
    Yang, Ming
    Zhang, Ji
    Wu, Mengyue
    INTERSPEECH 2024, 2024, : 4808 - 4812