A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports

被引:7
作者
Duque, Andres [1 ,2 ]
Fabregat, Hermenegildo [1 ]
Araujo, Lourdes [1 ,2 ]
Martinez-Romo, Juan [1 ,2 ]
机构
[1] Univ Nacl Educ Distancia UNED, ETS Ingn Informat, Juan Rosal 16, Madrid 28040, Spain
[2] Inst Mixto Invest Escuela Nacl Sanidad IMIENS, Madrid, Spain
关键词
Medical records; ICD-10; codes; Keyphrase extraction; Interpretability; SYSTEM;
D O I
10.1016/j.artmed.2021.102177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background and objectives: The 10th version of International Classification of Diseases (ICD-10) codification system has been widely adopted by the health systems of many countries, including Spain. However, manual code assignment of Electronic Health Records (EHR) is a complex and time-consuming task that requires a great amount of specialised human resources. Therefore, several machine learning approaches are being proposed to assist in the assignment task. In this work we present an alternative system for automatically recommending ICD-10 codes to be assigned to EHRs. Methods: Our proposal is based on characterising ICD-10 codes by a set of keyphrases that represent them. These keyphrases do not only include those that have literally appeared in some EHR with the considered ICD-10 codes assigned, but also others that have been obtained by a statistical process able to capture expressions that have led the annotators to assign the code. Results: The result is an information model that allows to efficiently recommend codes to a new EHR based on their textual content. We explore an approach that proves to be competitive with other state-of-the-art ap-proaches and can be combined with them to optimise results. Conclusions: In addition to its effectiveness, the recommendations of this method are easily interpretable since the phrases in an EHR leading to recommend an ICD-10 code are known. Moreover, the keyphrases associated with each ICD-10 code can be a valuable additional source of information for other approaches, such as machine learning techniques.
引用
收藏
页数:16
相关论文
共 63 条
[1]   ICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification Problem [J].
Almagro, Mario ;
Martinez Unanue, Raquel ;
Fresno, Victor ;
Montalvo, Soto .
IEEE ACCESS, 2020, 8 :100073-100083
[2]   A cross-lingual approach to automatic ICD-10 coding of death certificates by exploring machine translation [J].
Almagro, Mario ;
Martinez, Raquel ;
Montalvo, Soto ;
Fresno, Victor .
JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 94
[3]  
Almagro-C adiz M, 2018, PROCESAMIENTO LENGUA, V60, P45
[4]  
[Anonymous], 2014, P 8 INT WORKSH SEM E
[5]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[6]   Interpretable deep learning to map diagnostic texts to ICD-10 codes [J].
Atutxa, Aitziber ;
Diaz de Ilarraza, Arantza ;
Gojenola, Koldo ;
Oronoz, Maite ;
Perez-de-Vinaspre, Olatz .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 :49-59
[7]  
Bhatia Kush, 2015, Advances in Neural Information Processing Systems, V28
[8]  
Bittar A, 2018, WORKING NOTES CLEF 2
[9]  
Blanco A, 2020, AUTOMATIC CLASSIFICA
[10]   Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity [J].
Blanco, Alberto ;
Perez-de-Vinaspre, Olatz ;
Perez, Alicia ;
Casillas, Arantza .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2020, 188