Weakly guided attention model with hierarchical interaction for brain CT report generation

被引：3

作者：

Zhang, Xiaodan ^{[1
]}

Yang, Sisi ^{[1
]}

Shi, Yanzhao ^{[1
]}

Ji, Junzhong ^{[1
]}

Liu, Ying ^{[2
]}

Wang, Zheng ^{[2
]}

Xu, Huimin ^{[2
]}

机构：

[1] Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

[2] Peking Univ Third Hosp, Dept Radiol, Beijing, Peoples R China

来源：

COMPUTERS IN BIOLOGY AND MEDICINE | 2023年 / 167卷

基金：

中国国家自然科学基金;

关键词：

Weakly guided attention; Hierarchical interaction; Brain CT; Medical report generation; NETWORK;

D O I：

10.1016/j.compbiomed.2023.107650

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Brain Computed Tomography (CT) report generation, which aims to assist radiologists in diagnosing cerebrovascular diseases efficiently, is challenging in feature representation for dozens of images and language descriptions with several sentences. Existing report generation methods have achieved significant achievement based on the encoder-decoder framework and attention mechanism. However, current research has limitations in solving the many-to-many alignment between the multi-images of Brain CT imaging and the multi-sentences of Brain CT report, and fails to attend to critical images and lesion areas, resulting in inaccurate descriptions. In this paper, we propose a novel Weakly Guided Attention Model with Hierarchical Interaction, named WGAM-HI, to improve Brain CT report generation. Specifically, WGAM-HI conducts many-to-many matching for multiple visual images and semantic sentences via a hierarchical interaction framework with a two -layer attention model and a two-layer report generator. In addition, two weakly guided mechanisms are proposed to facilitate the attention model to focus more on important images and lesion areas under the guidance of pathological events and Gradient-weighted Class Activation Mapping (Grad-CAM) respectively. The pathological event acts as a bridge between the essential serial images and the corresponding sentence, and the Grad-CAM bridges the lesion areas and pathology words. Therefore, under the hierarchical interaction with the weakly guided attention model, the report generator generates more accurate words and sentences. Experiments on the Brain CT dataset demonstrate the effectiveness of WGAM-HI in attending to important images and lesion areas gradually, and generating more accurate reports.

引用

页数：12

共 52 条

[31] Self-critical Sequence Training for Image Captioning
Rennie, Steven J.
Marcheret, Etienne
Mroueh, Youssef
Ross, Jerret
Goel, Vaibhava
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1179 - 1195
[32] Selvaraju RR, 2020, INT J COMPUT VISION, V128, P336, DOI [10.1007/s11263-019-01228-7, 10.1109/ICCV.2017.74]
[33] Learning to Read Chest X-Rays: Recurrent Neural Cascade Model for Automated Image Annotation
Shin, Hoo-Chang
Roberts, Kirk
Lu, Le
Demner-Fushman, Dina
Yao, Jianhua
Summers, Ronald M.
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2497 - 2506
[34] Sisi Yang, 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), P568, DOI 10.1109/BIBM52615.2021.9669626
[35] Song X., 2022, P 29 INT C COMPUTATI
[36] A feedback model of visual attention
Spratling, MW
Johnson, MH
[J]. JOURNAL OF COGNITIVE NEUROSCIENCE, 2004, 16 (02) : 219 - 237
[37] Vedantam R, 2015, PROC CVPR IEEE, P4566, DOI 10.1109/CVPR.2015.7299087
[38] Vinyals O, 2015, PROC CVPR IEEE, P3156, DOI 10.1109/CVPR.2015.7298935
[39] Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition
Wang, Fuyu
Liang, Xiaodan
Xu, Lin
Lin, Liang
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 5015 - 5025
[40] Cross-Modal Prototype Driven Network for Radiology Report Generation
Wang, Jun
Bhalerao, Abhir
He, Yulan
[J]. COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 563 - 579

← 1 2 3 4 5 6 →