Chest radiology report generation based on cross-modal multi-scale feature fusion

被引:4
作者
Pan, Yu [1 ]
Liu, Li -Jun [1 ,2 ,3 ]
Yang, Xiao-Bing [1 ]
Peng, Wei [1 ]
Huang, Qing-Song [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Comp Technol Applicat, Kunming, Peoples R China
[3] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Wujiaying St, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Report generation; Cross; -modal; Multi; -scale; Medical image; Attention mechanism; Deep learning;
D O I
10.1016/j.jrras.2024.100823
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Chest radiology imaging plays a crucial role in the early screening, diagnosis, and treatment of chest diseases. The accurate interpretation of radiological images and the automatic generation of radiology reports not only save the doctor's time but also mitigate the risk of errors in diagnosis. The core objective of automatic radiology report generation is to achieve precise mapping of visual features and lesion descriptions at multi-scale and finegrained levels. Existing methods typically combine global visual features and textual features to generate radiology reports. However, these approaches may ignore the key lesion areas and lack sensitivity to crucial lesion location information. Furthermore, achieving multi-scale characterization and fine-grained alignment of medical visual features and report text features proves challenging, leading to a reduction in the quality of radiology report generation. Addressing these issues, we propose a method for chest radiology report generation based on cross-modal multi-scale feature fusion. First, an auxiliary labeling module is designed to guide the model to focus on the lesion region of the radiological image. Second, a channel attention network is employed to enhance the characterization of location information and disease features. Finally, a cross-modal features fusion module is constructed by combining memory matrices, facilitating fine-grained alignment between multi-scale visual features and reporting text features on corresponding scales. The proposed method is experimentally evaluated on two publicly available radiological image datasets. The results demonstrate superior performance based on BLEU and ROUGE metrics compared to existing methods. Particularly, there are improvements of 4.8% in the ROUGE metric and 9.4% in the METEOR metric on the IU X-Ray dataset. Moreover, there is a 7.4% enhancement in BLEU-1 and a 7.6% improvement in the BLEU-2 on the MIMIC-CXR dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Cross-modal pedestrian re-identification based on feature fusion and spatial information adaptation
    Zhao, Qian
    Qian, Zhengzhe
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)
  • [32] Diagnosis of Arrhythmia Based on Multi-scale Feature Fusion and Imbalanced Data
    Cheng, Z.
    Liu, Zx
    Yang, Gl
    PROCEEDINGS OF 2022 7TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2022, 2022, : 92 - 98
  • [33] Image Inpainting Based on Structural Constraint and Multi-Scale Feature Fusion
    Fan, Yao
    Shi, Yingnan
    Zhang, Ningjun
    Chu, Yanli
    IEEE ACCESS, 2023, 11 : 16567 - 16587
  • [34] Traffic sign detection based on multi-scale feature extraction and cascade feature fusion
    Zhang, Yongliang
    Lu, Yang
    Zhu, Wuqiang
    Wei, Xing
    Wei, Zhen
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (02) : 2137 - 2152
  • [35] Traffic sign detection based on multi-scale feature extraction and cascade feature fusion
    Yongliang Zhang
    Yang Lu
    Wuqiang Zhu
    Xing Wei
    Zhen Wei
    The Journal of Supercomputing, 2023, 79 : 2137 - 2152
  • [36] A Novel Multi-scale Feature Fusion Based Network for Hyperspectral and Multispectral Image Fusion
    Dong, Shuai
    Huang, Shaoguang
    Zhang, Jinhan
    Zhang, Hongyan
    PATTERN RECOGNITION AND COMPUTER VISION, PT XIII, PRCV 2024, 2025, 15043 : 530 - 544
  • [37] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    IEEE ACCESS, 2024, 12 : 45134 - 45146
  • [38] Multi-scale Vertical Cross-layer Feature Aggregation and Attention Fusion Network for Object Detection
    Gao, Wenting
    Li, Xiaojuan
    Han, Yu
    Liu, Yue
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 139 - 150
  • [39] A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion
    Li, Nianfeng
    Wang, Zhenyan
    Huang, Yongyuan
    Tian, Jia
    Li, Xinyuan
    Xiao, Zhiguo
    SENSORS, 2024, 24 (12)
  • [40] Adaptive feature fusion with attention mechanism for multi-scale target detection
    Ju, Moran
    Luo, Jiangning
    Wang, Zhongbo
    Luo, Haibo
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07) : 2769 - 2781