SAM-GUIDED ENHANCED FINE-GRAINED ENCODING WITH MIXED SEMANTIC LEARNING FOR MEDICAL IMAGE CAPTIONING

被引:1
|
作者
Zhang, Zhenyu [1 ]
Wang, Benlu [1 ]
Liang, Weijie [1 ]
Li, Yizhi [1 ]
Guo, Xuechen [1 ]
Wang, Guanhong [1 ]
Li, Shiyan [2 ]
Wang, Gaoang [1 ]
机构
[1] Zhejiang Univ, Zhejiang Univ Illinois Urbana Champaign Inst, Hangzhou, Peoples R China
[2] Zhejiang Univ, Sch Med, Sir Run Run Shaw Hosp, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Medical Image; Multimodal; Image Captioning; Dual Image Encoders; Large Language Model;
D O I
10.1109/ICASSP48485.2024.10446878
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations. However, current generic text and image pre-trained models do not yield satisfactory results when it comes to describing intricate details within medical images. In this paper, we present a novel medical image captioning method guided by the segment anything model (SAM) to enable enhanced encoding with both general and detailed feature extraction. In addition, our approach employs a distinctive pre-training strategy with mixed semantic learning to simultaneously capture both the overall information and finer details within medical images. We demonstrate the effectiveness of this approach, as it outperforms the pre-trained BLIP2 model on various evaluation metrics for generating descriptions of medical images.
引用
收藏
页码:1731 / 1735
页数:5
相关论文
共 50 条
  • [31] Pixel Saliency Based Encoding for Fine-Grained Image Classification
    Yin, Chao
    Zhang, Lei
    Liu, Ji
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 274 - 285
  • [32] Fine-grained Sentiment Semantic Analysis and Matching of Music and Image
    Su, Zhibin
    Peng, Ding
    Hui, Ren
    Zhang, Yunfang
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 1593 - 1597
  • [33] Fine-grained Image Classification by Visual-Semantic Embedding
    Xu, Huapeng
    Qi, Guilin
    Li, Jingjing
    Wang, Meng
    Xu, Kang
    Gao, Huan
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1043 - 1049
  • [34] Strengthen contrastive semantic consistency for fine-grained image classification
    Wang, Yupeng
    Wang, Yongli
    Ye, Qiaolin
    Lang, Wenxi
    Xu, Can
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [35] Referring Image Segmentation With Fine-Grained Semantic Funneling Infusion
    Yang, Jiaxing
    Zhang, Lihe
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14727 - 14738
  • [36] Weakly supervised fine-grained semantic segmentation via spatial correlation-guided learning
    Dong, Zihao
    Fang, Tiyu
    Li, Jinping
    Shao, Xiuli
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 236
  • [37] Tacoma: Enhanced Browser Fuzzing with Fine-Grained Semantic Alignment
    Wang, Jiashui
    Qian, Peng
    Huang, Xilin
    Ying, Xinlei
    Chen, Yan
    Ji, Shouling
    Chen, Jianhai
    Xie, Jundong
    Liu, Long
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1174 - 1185
  • [38] Fine-grained correlation analysis for medical image retrieval
    Wang, Xiaoqin
    Lan, Rushi
    Wang, Huadeng
    Liu, Zhenbing
    Luo, Xiaonan
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 90
  • [39] Class Guided Channel Weighting Network for Fine-Grained Semantic Segmentation
    Zhang, Xiang
    Zhao, Wanqing
    Luo, Hangzai
    Peng, Jinye
    Fan, Jianping
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3344 - 3352
  • [40] MASK GUIDED ATTENTION FOR FINE-GRAINED PATCHY IMAGE CLASSIFICATION
    Wang, Jun
    Yu, Xiaohan
    Gao, Yongsheng
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1044 - 1048