A survey of word embeddings for clinical text

被引:66
作者
Khattak F.K. [1 ,3 ,4 ]
Jeblee S. [1 ,3 ]
Pou-Prom C. [1 ,4 ]
Abdalla M. [1 ,3 ]
Meaney C. [2 ,3 ]
Rudzicz F. [1 ,3 ,4 ,5 ]
机构
[1] Department of Computer Science, University of Toronto, Toronto, Ontario
[2] Department of Biostatistics, University of Toronto, Toronto, Ontario
[3] Vector Institute for Artificial Intelligence, Toronto, Ontario
[4] Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario
[5] Surgical Safety Technologies Inc, Toronto, Ontario
关键词
Clinical data; Natural language processing; Word embeddings;
D O I
10.1016/j.yjbinx.2019.100057
中图分类号
学科分类号
摘要
Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications. © 2019 The Author(s)
引用
收藏
相关论文
共 93 条
[31]  
Shin H.-C., Lu L., Kim L., Seff A., Yao J., Summers R.M., Interleaved text/image deep mining on a very large-scale radiology database, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1090-1099, (2015)
[32]  
Dubois S., Romano N.
[33]  
Escudie J.-B., Saade A., Coucke A., Lelarge M., (1803)
[34]  
Kholghi M., De Vine L., Sitbon L., Zuccon G., Nguyen A., The benefits of word embeddings features for active learning in clinical information extraction, Proceedings of the Australasian Language Technology Association Workshop 2016, pp. 25-34, (2016)
[35]  
Patel K., Patel D., Golakiya M., Bhattacharyya P., Birari N., Adapting pre-trained word embeddings for use in medical coding, BioNLP, 2017, pp. 302-306, (2017)
[36]  
Gehrmann S., Dernoncourt F., Li Y., Carlson E.T., Wu J.T., Welt J., Foote J., Moseley E.T., Grant D.W., Tyler P.D., Et al., Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives, PloS One, 13, 2, (2018)
[37]  
Craig E., Arias C., Gillman D.
[38]  
Nguyen P., Tran T., Wickramasinghe N., Venkatesh S., Deepr: A convolutional net for medical records, IEEE J. Biomed. Health Informat., 21, 1, pp. 22-30, (2017)
[39]  
Pham T., Tran T., Phung D., Venkatesh S., Deepcare: A deep dynamic memory model for predictive medicine, Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 30-41, (2016)
[40]  
Peng Y., Yan S., Lu Z., (1906)