AIMNET: ADAPTIVE IMAGE-TAG MERGING NETWORK FOR AUTOMATIC MEDICAL REPORT GENERATION

被引：5

作者：

Shi, Jijun ^{[1
]}

Wang, Shanshe ^{[2
]}

Wang, Ronggang ^{[1
]}

Ma, Siwei ^{[2
]}

机构：

[1] Peking Univ, Sch Elect & Comp Engn, Beijing, Peoples R China

[2] Peking Univ, Sch Comp Sci, Beijing, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

基金：

中国国家自然科学基金;

关键词：

Medical Report Generation; Data Bias; Attention Mechanism;

D O I：

10.1109/ICASSP43922.2022.9747702

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In recent years, medical report generation has received increasing research interest with the goal of automatically generating long and coherent descriptive paragraphs that can describe in detail the observations of normal and abnormal regions in the input medical images. Unlike general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to severe visual and textual data biases. To address these problems, we propose an Adaptive Image-Tag Merging Network (AIMNet) that first predicts the tags of diseases from the input image, and then adaptively merges the visual information of the input image and disease information from the disease tags to learn the disease-oriented visual features that can better represent abnormal regions of the input image, and thus can be used to alleviate data bias problems. The experiments and analyses on the public MIMIC-CXR and IU-Xray datasets show that our proposed AIMNet achieves the state-of-the-art results under all metrics and significantly outperforms previous models on the MIMIC-CXR and IU-Xray datasets with relatively 15.1% and 6.5% margins in terms of BLEU-4 score.

引用

页码：7737 / 7741

页数：5

共 27 条

[1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J].

Anderson, Peter ;

Wu, Qi ;

Teney, Damien ;

Bruce, Jake ;

Johnson, Mark ;

Sunderhauf, Niko ;

Reid, Ian ;

Gould, Stephen ;

van den Hengel, Anton .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3674-3683

[2]

[Anonymous], 2018, NeurIPS

[3]

[Anonymous], 2015, Microsoft COCO captions: Data collection and evaluation server

[4]

Brady Adrian, 2012, Ulster Med J, V81, P3

[5]

Chen ZH, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P1439

[6] Preparing a collection of radiology examinations for distribution and retrieval [J].

Demner-Fushman, Dina ;

Kohli, Marc D. ;

Rosenman, Marc B. ;

Shooshan, Sonya E. ;

Rodriguez, Laritza ;

Antani, Sameer ;

Thoma, George R. ;

McDonald, Clement J. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (02) :304-310

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[10]

Irvin J, 2019, AAAI CONF ARTIF INTE, P590

← 1 2 3 →