Vision-knowledge fusion model for multi-domain medical report generation

被引:12
|
作者
Xu, Dexuan [1 ,2 ]
Zhu, Huashi [1 ,2 ]
Huang, Yu [1 ]
Jin, Zhi [3 ]
Ding, Weiping [4 ]
Li, Hang [5 ,6 ]
Ran, Menglong [5 ,6 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing 100871, Peoples R China
[2] Peking Univ, Sch Software & Microelect, Beijing 100871, Peoples R China
[3] Peking Univ, Key Lab High Confidence Software Technol, Beijing 100871, Peoples R China
[4] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[5] Peking Univ, Dept Dermatol, Hosp 1, Beijing 100034, Peoples R China
[6] Natl Clin Res Ctr Skin & Immune Dis, Beijing 100034, Peoples R China
关键词
Medical report generation; Knowledge graph; Multi-modal fusion; Graph neural network;
D O I
10.1016/j.inffus.2023.101817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical report generation with knowledge graph is an essential task in the medical field. Although the existing knowledge graphs have many entities, their semantics are not sufficient due to the challenge of uniformly extracting and fusing the expert knowledge from different diseases. Therefore, it is necessary to automatically construct specific knowledge graph. In this paper, we propose a vision-knowledge fusion model based on medical images and knowledge graphs to fully utilize high-quality data from different diseases and languages. Firstly, we give a general method to automatically construct every domain knowledge graph based on medical standards. Secondly, we design a knowledge-based attention mechanism to effectively fuse image and knowledge. Then, we build a triples restoration module to obtain fine-grained knowledge, and the knowledge-based evaluation metrics are first proposed which are more reasonable and measurable from different dimensions. Finally, we conduct experiments to verify the effectiveness of our model on two different diseases datasets: the IU-Xray chest radiograph public dataset and the NCRC-DS dataset of Chinese dermoscopy reports we compiled. Our model outperforms previous benchmark methods and achieves excellent evaluation scores on both datasets. Additionally, interpretability and clinical usefulness of the model are validated and our method can be generalized to multiple domains and different diseases.
引用
收藏
页数:12
相关论文
共 44 条
  • [21] Improving Medical X-ray Report Generation by Using Knowledge Graph
    Zhang, Dehai
    Ren, Anquan
    Liang, Jiashu
    Liu, Qing
    Wang, Haoxing
    Ma, Yu
    APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [22] KdTNet: Medical Image Report Generation via Knowledge-Driven Transformer
    Cao, Yiming
    Cui, Lizhen
    Yu, Fuqiang
    Zhang, Lei
    Li, Zhen
    Liu, Ning
    Xu, Yonghui
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 117 - 132
  • [23] Medical Visual Question Answering Model Based on Knowledge Enhancement and Multimodal Fusion
    Dianyuan, Zhang
    Chuanming, Yu
    Data Analysis and Knowledge Discovery, 2024, 8 (8-9) : 226 - 239
  • [24] CGFTrans: Cross-Modal Global Feature Fusion Transformer for Medical Report Generation
    Xu, Liming
    Tang, Quan
    Zheng, Bochuan
    Lv, Jiancheng
    Li, Weisheng
    Zeng, Xianhua
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5600 - 5612
  • [25] Auxiliary signal-guided knowledge encoder-decoder for medical report generation
    Li, Mingjie
    Liu, Rui
    Wang, Fuyu
    Chang, Xiaojun
    Liang, Xiaodan
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (01): : 253 - 270
  • [26] Auxiliary signal-guided knowledge encoder-decoder for medical report generation
    Mingjie Li
    Rui Liu
    Fuyu Wang
    Xiaojun Chang
    Xiaodan Liang
    World Wide Web, 2023, 26 : 253 - 270
  • [27] Knowledge Enhanced Vision and Language Model for Multi-Modal Fake News Detection
    Gao, Xingyu
    Wang, Xi
    Chen, Zhenyu
    Zhou, Wei
    Hoi, Steven C. H.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8312 - 8322
  • [28] Automatic Medical Image Report Generation with Multi-view and Multi-modal Attention Mechanism
    Yang, Shaokang
    Niu, Jianwei
    Wu, Jiyan
    Liu, Xuefeng
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT III, 2020, 12454 : 687 - 699
  • [29] Radiology report generation with medical knowledge and multilevel image-report alignment: A new method and its verification
    Zhao, Guosheng
    Zhao, Zijian
    Gong, Wuxian
    Li, Feng
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 146
  • [30] Harnessing the Power of Pre-trained Vision-Language Models for Efficient Medical Report Generation
    Li, Qi
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1308 - 1317