SAM-GUIDED ENHANCED FINE-GRAINED ENCODING WITH MIXED SEMANTIC LEARNING FOR MEDICAL IMAGE CAPTIONING

被引:1
|
作者
Zhang, Zhenyu [1 ]
Wang, Benlu [1 ]
Liang, Weijie [1 ]
Li, Yizhi [1 ]
Guo, Xuechen [1 ]
Wang, Guanhong [1 ]
Li, Shiyan [2 ]
Wang, Gaoang [1 ]
机构
[1] Zhejiang Univ, Zhejiang Univ Illinois Urbana Champaign Inst, Hangzhou, Peoples R China
[2] Zhejiang Univ, Sch Med, Sir Run Run Shaw Hosp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical Image; Multimodal; Image Captioning; Dual Image Encoders; Large Language Model;
D O I
10.1109/ICASSP48485.2024.10446878
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations. However, current generic text and image pre-trained models do not yield satisfactory results when it comes to describing intricate details within medical images. In this paper, we present a novel medical image captioning method guided by the segment anything model (SAM) to enable enhanced encoding with both general and detailed feature extraction. In addition, our approach employs a distinctive pre-training strategy with mixed semantic learning to simultaneously capture both the overall information and finer details within medical images. We demonstrate the effectiveness of this approach, as it outperforms the pre-trained BLIP2 model on various evaluation metrics for generating descriptions of medical images.
引用
收藏
页码:1731 / 1735
页数:5
相关论文
共 50 条
  • [21] Semantic prior guided fine-grained facial expression manipulation
    Xue, Tao
    Yan, Jin
    Zheng, Deshuai
    Liu, Yong
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 4609 - 4624
  • [22] Semantic prior guided fine-grained facial expression manipulation
    Tao Xue
    Jin Yan
    Deshuai Zheng
    Yong Liu
    Complex & Intelligent Systems, 2024, 10 : 4609 - 4624
  • [23] Semantic interaction learning for fine-grained vehicle recognition
    Zhang, Jingjing
    Lei, Jingsheng
    Yang, Shengying
    Yang, Xinqi
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (01)
  • [24] Learning to locate for fine-grained image recognition
    Chen, Jiamin
    Hu, Jianguo
    Li, Shiren
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 206
  • [25] Incremental Learning for Fine-Grained Image Recognition
    Cao, Liangliang
    Hsiao, Jenhao
    de Juan, Paloma
    Li, Yuncheng
    Thomee, Bart
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 363 - 366
  • [26] ADVERSARIAL LEARNING FOR FINE-GRAINED IMAGE SEARCH
    Lin, Kevin
    Yang, Fan
    Wang, Qiaosong
    Piramuthu, Robinson
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 490 - 495
  • [27] Fine-Grained Medical Image Synthesis with Dual-Attention Adversarial Learning
    Xiao, Qiuyu
    Nie, Dong
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, PT II, MIUA 2024, 2024, 14860 : 298 - 306
  • [28] REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
    Jiang, Ming
    Hu, Junjie
    Huang, Qiuyuan
    Zhang, Lei
    Diesner, Jana
    Gao, Jianfeng
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1475 - 1480
  • [29] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [30] Image Difference Captioning With Instance-Level Fine-Grained Feature Representation
    Huang, Qingbao
    Liang, Yu
    Wei, Jielong
    Yi, Cai
    Liang, Hanyu
    Leung, Ho-fung
    Li, Qing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2004 - 2017