SAM-GUIDED ENHANCED FINE-GRAINED ENCODING WITH MIXED SEMANTIC LEARNING FOR MEDICAL IMAGE CAPTIONING

被引:1
|
作者
Zhang, Zhenyu [1 ]
Wang, Benlu [1 ]
Liang, Weijie [1 ]
Li, Yizhi [1 ]
Guo, Xuechen [1 ]
Wang, Guanhong [1 ]
Li, Shiyan [2 ]
Wang, Gaoang [1 ]
机构
[1] Zhejiang Univ, Zhejiang Univ Illinois Urbana Champaign Inst, Hangzhou, Peoples R China
[2] Zhejiang Univ, Sch Med, Sir Run Run Shaw Hosp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical Image; Multimodal; Image Captioning; Dual Image Encoders; Large Language Model;
D O I
10.1109/ICASSP48485.2024.10446878
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations. However, current generic text and image pre-trained models do not yield satisfactory results when it comes to describing intricate details within medical images. In this paper, we present a novel medical image captioning method guided by the segment anything model (SAM) to enable enhanced encoding with both general and detailed feature extraction. In addition, our approach employs a distinctive pre-training strategy with mixed semantic learning to simultaneously capture both the overall information and finer details within medical images. We demonstrate the effectiveness of this approach, as it outperforms the pre-trained BLIP2 model on various evaluation metrics for generating descriptions of medical images.
引用
收藏
页码:1731 / 1735
页数:5
相关论文
共 50 条
  • [1] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [2] High-Quality Image Captioning With Fine-Grained and Semantic-Guided Visual Attention
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1681 - 1693
  • [3] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [4] ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Mohammad Alamgir
    DISPLAYS, 2024, 84
  • [5] Learning Relationship-Enhanced Semantic Graph for Fine-Grained Image-Text Matching
    Liu, Xin
    He, Yi
    Cheung, Yiu-Ming
    Xu, Xing
    Wang, Nannan
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) : 948 - 961
  • [6] Attention-Guided Hierarchical Parsing for Fine-Grained Person-Centric Image Captioning
    Gu, Zhengcheng
    Jin, Jing
    IEEE ACCESS, 2024, 12 : 86293 - 86301
  • [7] Semantic-Guided Information Alignment Network for Fine-Grained Image Recognition
    Wang, Shijie
    Wang, Zhihui
    Li, Haojie
    Chang, Jianlong
    Ouyang, Wanli
    Tian, Qi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6558 - 6570
  • [8] Ultra Fine-Grained Image Semantic Embedding
    Juan, Da-Cheng
    Lu, Chun-To
    Li, Zhen
    Peng, Futang
    Timofeev, Aleksei
    Chen, Yi-Ting
    Gao, Yaxi
    Duerig, Tom
    Tomkins, Andrew
    Ravi, Sujith
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 277 - 285
  • [9] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [10] Learning Semantically Enhanced Feature for Fine-Grained Image Classification
    Luo, Wei
    Zhang, Hengmin
    Li, Jun
    Wei, Xiu-Shen
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 1545 - 1549