Development of pre-trained language models for clinical NLP in Spanish

被引:0
作者
Aracena, Claudio [1 ,2 ]
Dunstan, Jocelyn [2 ,3 ,4 ,5 ]
机构
[1] Univ Chile, Fac Phys & Math Sci, Santiago, Chile
[2] Millennium Inst Fdn Res Data, Santiago, Chile
[3] Pontificia Univ Catolica Chile, Dept Comp Sci, Santiago, Chile
[4] Pontificia Univ Catolica Chile, Inst Computat Math, Santiago, Chile
[5] Univ Chile, Ctr Math Modeling, Santiago, Chile
来源
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clinical natural language processing aims to tackle language and prediction tasks using text from medical practice, such as clinical notes, prescriptions, and discharge summaries. Several approaches have been tried to deal with these tasks. Since 2017, pre-trained language models (PLMs) have achieved state-of-the-art performance in many tasks. However, most works have been developed in English. This PhD research proposal addresses the development of PLMs for clinical NLP in Spanish. To carry out this study, we will build a clinical corpus big enough to implement a functional PLM. We will test several PLM architectures and evaluate them with language and prediction tasks. The novelty of this work lies in the use of only clinical text, while previous clinical PLMs have used a mix of general, biomedical, and clinical text.
引用
收藏
页码:52 / 60
页数:9
相关论文
共 38 条
[1]  
Akbik A., 2018, COLING 2018, 27th International Conference on Computational Linguistics, P1638
[2]  
Akbik A, 2019, NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, P54
[3]   Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives [J].
Akhtyamova, Liliya ;
Martinez, Paloma ;
Verspoor, Karin ;
Cardiff, John .
IEEE ACCESS, 2020, 8 (164717-164726) :164717-164726
[4]  
Alsentzer E., 2019, PROC 2 CLIN NATURAL, P72, DOI [10.18653/v1/W19-1909, DOI 10.18653/V1/W19-1909]
[5]  
[Anonymous], 2018, Clinical Text Mining: Secondary use of Electronic Patient Records
[6]  
Aracena Claudio, 2022, P 13 INT WORKSH HLTH
[7]  
Baez P., 2020, P 3 CLIN NAT LANG PR, P291, DOI DOI 10.18653/V1/2020.CLINICALNLP-1.32
[8]  
Baez Pablo, 2022, ACM Transactions on Computing for Healthcare (HEALTH), P1
[9]   A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine [J].
Campillos-Llanos, Leonardo ;
Valverde-Mateos, Ana ;
Capllonch-Carrion, Adrian ;
Moreno-Sandoval, Antonio .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
[10]  
Canete J., 2020, PML4DC ICLR 2020