Captioning Ultrasound Images Automatically

被引:18
作者
Alsharid, Mohammad [1 ]
Sharma, Harshita [1 ]
Drukker, Lior [2 ]
Chatelain, Pierre [1 ]
Papageorghiou, Aris T. [2 ]
Noble, J. Alison [1 ]
机构
[1] Univ Oxford, Inst Biomed Engn, Oxford, England
[2] Univ Oxford, Nuffield Dept Womens & Reprod Hlth, Oxford, England
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV | 2019年 / 11767卷
基金
英国工程与自然科学研究理事会;
关键词
Image description; Image captioning; Deep learning; Natural language processing; Recurrent neural networks; Fetal ultrasound;
D O I
10.1007/978-3-030-32251-9_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe an automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists. The generated captions are similar to the words spoken by a sonographer when describing the scan experience in terms of visual content and performed scanning actions. Using full-length second-trimester fetal ultrasound videos and text derived from accompanying expert voice-over audio recordings, we train deep learning models consisting of convolutional neural networks and recurrent neural networks in merged configurations to generate captions for ultrasound video frames. We evaluate different model architectures using established general metrics (BLEU, ROUGE-L) and application-specific metrics. Results show that the proposed models can learn joint representations of image and text to generate relevant and descriptive captions for anatomies, such as the spine, the abdomen, the heart, and the head, in clinical fetal ultrasound scans.
引用
收藏
页码:338 / 346
页数:9
相关论文
共 20 条
[1]  
Bernardi R, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4970
[2]  
Cho K, 2014, ARXIV14061078
[3]  
Elliott D., 2013, P 2013 C EMP METH NA, P1292
[4]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[5]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[6]  
Kingma DP, 2014, ARXIV
[7]  
Lin C.-Y., 2004, P WORKSH TEXT SUMM A, P74
[8]  
Lyndon D., 2017, CEUR WORKSHOP P, V1866
[9]   MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment [J].
McCarthy, Philip M. ;
Jarvis, Scott .
BEHAVIOR RESEARCH METHODS, 2010, 42 (02) :381-392
[10]  
Mikolov T., 2013, ADV NEURAL INFORM PR, V26, P3111