Chest radiology report generation based on cross-modal multi-scale feature fusion

被引:4
|
作者
Pan, Yu [1 ]
Liu, Li -Jun [1 ,2 ,3 ]
Yang, Xiao-Bing [1 ]
Peng, Wei [1 ]
Huang, Qing-Song [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Comp Technol Applicat, Kunming, Peoples R China
[3] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Wujiaying St, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Report generation; Cross; -modal; Multi; -scale; Medical image; Attention mechanism; Deep learning;
D O I
10.1016/j.jrras.2024.100823
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chest radiology imaging plays a crucial role in the early screening, diagnosis, and treatment of chest diseases. The accurate interpretation of radiological images and the automatic generation of radiology reports not only save the doctor's time but also mitigate the risk of errors in diagnosis. The core objective of automatic radiology report generation is to achieve precise mapping of visual features and lesion descriptions at multi-scale and finegrained levels. Existing methods typically combine global visual features and textual features to generate radiology reports. However, these approaches may ignore the key lesion areas and lack sensitivity to crucial lesion location information. Furthermore, achieving multi-scale characterization and fine-grained alignment of medical visual features and report text features proves challenging, leading to a reduction in the quality of radiology report generation. Addressing these issues, we propose a method for chest radiology report generation based on cross-modal multi-scale feature fusion. First, an auxiliary labeling module is designed to guide the model to focus on the lesion region of the radiological image. Second, a channel attention network is employed to enhance the characterization of location information and disease features. Finally, a cross-modal features fusion module is constructed by combining memory matrices, facilitating fine-grained alignment between multi-scale visual features and reporting text features on corresponding scales. The proposed method is experimentally evaluated on two publicly available radiological image datasets. The results demonstrate superior performance based on BLEU and ROUGE metrics compared to existing methods. Particularly, there are improvements of 4.8% in the ROUGE metric and 9.4% in the METEOR metric on the IU X-Ray dataset. Moreover, there is a 7.4% enhancement in BLEU-1 and a 7.6% improvement in the BLEU-2 on the MIMIC-CXR dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching
    Xu, Xing
    Wang, Yifan
    He, Yixuan
    Yang, Yang
    Hanjalic, Alan
    Shen, Heng Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)
  • [22] MFANet: Multi-scale feature fusion network with attention mechanism
    Gaihua Wang
    Xin Gan
    Qingcheng Cao
    Qianyu Zhai
    The Visual Computer, 2023, 39 : 2969 - 2980
  • [23] MFANet: Multi-scale feature fusion network with attention mechanism
    Wang, Gaihua
    Gan, Xin
    Cao, Qingcheng
    Zhai, Qianyu
    VISUAL COMPUTER, 2023, 39 (07) : 2969 - 2980
  • [24] Similarity Retrieval and Medical Cross-Modal Attention Based Medical Report Generation
    Dong, Xinxin
    Pan, Haiwei
    Lan, Haiyan
    Zhang, Kejia
    Chen, Chunling
    WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 171 - 185
  • [25] A Cross-Modal Guiding and Fusion Method for Multi-Modal RSVP-based Image Retrieval
    Mao, Jiayu
    Qiu, Shuang
    Li, Dan
    Wei, Wei
    He, Huiguang
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] Image registration combining cross-scale point matching and multi-scale feature fusion
    Ou, Zhuolin
    Lu, Xiaoqi
    Gu, Yu
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2024, 39 (08) : 1090 - 1102
  • [27] Underwater image object detection based on multi-scale feature fusion
    Yang, Chao
    Zhang, Ce
    Jiang, Longyu
    Zhang, Xinwen
    MACHINE VISION AND APPLICATIONS, 2024, 35 (06)
  • [28] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Khan, Mustaqeem
    Tran, Phuong-Nam
    Pham, Nhat Truong
    El Saddik, Abdulmotaleb
    Othmani, Alice
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [29] Wafer defect recognition method based on multi-scale feature fusion
    Chen, Yu
    Zhao, Meng
    Xu, Zhenyu
    Li, Kaiyue
    Ji, Jing
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [30] Lightweight road extraction model based on multi-scale feature fusion
    Liu Y.
    Chen Y.
    Gao L.
    Hong J.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (05): : 951 - 959