Unsupervised Linkage between ICD- and Alpha-ID Codes and Real-World Diagnoses from Medical Reports by Means of the "word2vec" Method

被引:3
作者
Boehringer, Daniel [1 ,2 ]
Lang, Stefan J. [1 ,2 ]
Daniel, Moritz Claudius [1 ,2 ]
Lapp, Thabo [1 ,2 ]
Reinhard, Thomas [1 ,2 ]
机构
[1] Univ Klinikum Freiburg, Klin Augenheilkunde, Killianstr 5, D-79106 Freiburg, Germany
[2] Albert Ludwigs Univ Freiburg, Fak Med, Freiburg, Germany
关键词
diagnosis; word2vec; medical report; INFORMATION; TEXT;
D O I
10.1055/a-1023-4490
中图分类号
R77 [眼科学];
学科分类号
100212 ;
摘要
Background Transformation into a standardised code system such as ICD-10 or Alpha-ID is required before medical reports can be scientifically analysed. This is due to the use of different terminologies and the frequent use of synonyms. The so-called "word vector embedding" seems to be suitable for the generation of the required thesaurus, because synonymous diagnoses can be identified independently of the spelling after suitable training of the underlying neural network. Methods All letters from a total of 50,000 patients were extracted anonymously. Diagnoses consisting of several words were merged into single words by means of phrase recognition and the "word2vec" model was trained on the text corpus of 352 megabytes. A total of 3742 diagnoses and ophthalmological interventions were extracted semi-automatically. The ophthalmological ICD and Alpha-ID codes were downloaded together with the official descriptions from the DIMDI website and the ophthalmological diagnoses/interventions were automatically linked with the nearest ICD- and Alpha-ID codes in the "word2vec" model. Results The "word2vec" model assigned 90% of the doctor's letter diagnoses correctly to appropriate ICD-10 codes. At the finer level of Alpha-ID, the rate of correct assignments was only 76%. The interventions were assigned to the correct indication in 92% of cases. Rare diseases, unusual designations and code degeneration in the official DIMDI file were identified as sources of error for incorrect or missing allocations. Discussion A diagnostic thesaurus can be generated with the "word2vec" method from a corpus of anonymised medical reports and the official Alpha-ID file from the DIMDI website. This thesaurus could be used for automatic extraction of diagnoses from doctor's letters in the future, given appropriate manual revision.
引用
收藏
页码:1413 / 1417
页数:5
相关论文
共 11 条
  • [1] Ad Hoc Information Extraction for Clinical Data Warehouses
    Dietrich, Georg
    Krebs, Jonathan
    Fette, Georg
    Ertl, Maximilian
    Kaspar, Mathias
    Stoerk, Stefan
    Puppe, Frank
    [J]. METHODS OF INFORMATION IN MEDICINE, 2018, 57 : E22 - E29
  • [2] DIMDI, ICD10GM DIMDI
  • [3] Scalable Topical Phrase Mining from Text Corpora
    El-Kishky, Ahmed
    Song, Yanglei
    Wang, Chi
    Voss, Clare R.
    Han, Jiawei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 305 - 316
  • [4] Feinerer I, 2008, J STAT SOFTW, V25, P1
  • [5] Extracting information from the text of electronic medical records to improve case detection: a systematic review
    Ford, Elizabeth
    Carroll, John A.
    Smith, Helen E.
    Scott, Donia
    Cassell, Jackie A.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (05) : 1007 - 1015
  • [6] Measurement of Quality with Routine Data
    Lang, Stefan J.
    Rilk, Robert
    Mueller, Alida Friederike
    Luebke, Jan
    Boehringer, Daniel
    Reinhard, Thomas
    [J]. KLINISCHE MONATSBLATTER FUR AUGENHEILKUNDE, 2017, 234 (07) : 891 - 893
  • [7] Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study
    Lin, Chin
    Lou, Yu-Sheng
    Tsai, Dung-Jang
    Lee, Chia-Cheng
    Hsu, Chia-Jung
    Wu, Ding-Chung
    Wang, Mei-Chuen
    Fang, Wen-Hui
    [J]. JMIR MEDICAL INFORMATICS, 2019, 7 (03)
  • [8] Mikolov T, 2019, EFFICIENT ESTIMATION
  • [9] Schmidt B., R PAKET WORD2VEC
  • [10] Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
    Sheikhalishahi, Seyedmostafa
    Miotto, Riccardo
    Dudley, Joel T.
    Lavelli, Alberto
    Rinaldi, Fabio
    Osmani, Venet
    [J]. JMIR MEDICAL INFORMATICS, 2019, 7 (02) : 15 - 32