Term Extraction from Medical Documents Using Word Embeddings

被引:0
作者
Bay, Matthias [1 ]
Bruness, Daniel [2 ]
Herold, Miriam [3 ]
Schulze, Christian [4 ]
Guckert, Michael [4 ]
Minor, Mirj Am [3 ]
机构
[1] MINDS Med GmbH, Frankfurt, Germany
[2] TH Mittelhessen, KITE Kompetenzzentrum Informationstechnol, Friedberg, Germany
[3] Goethe Univ, Dept Business Informat, Frankfurt, Germany
[4] TH Mittelhessen, Dept MND, Friedberg, Germany
来源
2020 6TH IEEE CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'20) | 2020年
关键词
Machine learning; natural language processing; text mining; term extraction; machine learning applications;
D O I
10.1109/CIST49399.2021.9357263
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a new method for the extraction of discipline-specific terms from medical documents. Due to the small text corpora and the specific nature of medical documents, there are limitations for approaches that are solely based on term frequencies. A combination of such methods with procedures that are sensitive to semantic aspects is therefore promising. We use word embeddings in a neighborhood context based method which we call Snowball because of its layerwise way of working. Snowball is integrated together with established methods into an end to end pipeline with which we can process documents to extract relevant terms. Proof of concept is given on a gold standard created recently together with experts in medical coding. The preliminary results highlight the feasibility of our approach and its potential for automated, machine learning based text processing in the medical context.
引用
收藏
页码:328 / 333
页数:6
相关论文
共 28 条
  • [1] [Anonymous], 2009, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
  • [2] ATR4S: toolkit with state-of-the-art automatic terms recognition methods in Scala
    Astrakhantsev, Nikita
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2018, 52 (03) : 853 - 872
  • [3] Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language
    Becker, Matthias
    Boeckmann, Britta
    [J]. HEALTH INFORMATICS MEETS EHEALTH, 2016, 223 : 71 - 76
  • [4] Boag W., 2018, START CLINER 2 0 AC
  • [5] The Unified Medical Language System (UMLS): integrating biomedical terminology
    Bodenreider, O
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D267 - D270
  • [6] Eisoldt S., 2017, Fallbuch Chirurgie: 140 Flle aktiv bearbeiten, V5th Ed.
  • [7] Google Inc, 2020, INTR TENSORFLOW FEAT
  • [8] Grave E, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3483
  • [9] Hazem A., 2020, P 6 INT WORKSH COMP, P95
  • [10] Text Mining in Clinical Domain: Dealing with Noise
    Hoang Nguyen
    Patrick, Jon
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 549 - 558