Mining fall-related information in clinical notes: Comparison of rule-based and novel word embedding-based machine learning approaches

被引:59
作者
Topaz, Maxim [1 ,2 ,3 ]
Murga, Ludmila [4 ]
Gaddis, Katherine M. [5 ]
McDonald, Margaret V. [3 ]
Bar-Bachar, Ofrit [4 ]
Goldberg, Yoav [6 ]
Bowles, Kathryn H. [3 ,5 ]
机构
[1] Columbia Univ, Sch Nursing, 560 W 168th St, New York, NY 10032 USA
[2] Columbia Univ, Data Sci Inst, New York, NY USA
[3] Visiting Nurse Serv New York, New York, NY USA
[4] Univ Haifa, Cheryl Spencer Dept Nursing, Haifa, Israel
[5] Univ Penn, Sch Nursing, Philadelphia, PA 19104 USA
[6] Bar Ilan Univ, Dept Comp Sci, Tel Aviv, Israel
关键词
Natural language processing; Word embedding models; Nursing informatics; Text mining; Falls;
D O I
10.1016/j.jbi.2019.103103
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background: Natural language processing (NLP) of health-related data is still an expertise demanding, and resource expensive process. We created a novel, open source rapid clinical text mining system called NimbleMiner. NimbleMiner combines several machine learning techniques (word embedding models and positive only labels learning) to facilitate the process in which a human rapidly performs text mining of clinical narratives, while being aided by the machine learning components. Objective: This manuscript describes the general system architecture and user Interface and presents results of a case study aimed at classifying fall-related information (including fall history, fall prevention interventions, and fall risk) in homecare visit notes. Methods: We extracted a corpus of homecare visit notes (n = 1,149,586) for 89,459 patients from a large US based homecare agency. We used a gold standard testing dataset of 750 notes annotated by two human reviewers to compare the NimbleMiner's ability to classify documents regarding whether they contain fall-related information with a previously developed rule-based NLP system. Results: NimbleMiner outperformed the rule-based system in almost all domains. The overall F- score was 85.8% compared to 81% by the rule based-system with the best performance for identifying general fall history (F = 89% vs. F = 85.1% rule-based), followed by fall risk (F = 87% vs. F = 78.7% rule-based), fall prevention interventions (F = 88.1% vs. F = 78.2% rule-based) and fall within 2 days of the note date (F = 83.1% vs. F = 80.6% rule-based). The rule-based system achieved slightly better performance for fall within 2 weeks of the note date (F = 81.9% vs. F = 84% rule-based). Discussion & conclusions: NimbleMiner outperformed other systems aimed at fall information classification, including our previously developed rule-based approach. These promising results indicate that clinical text mining can be implemented without the need for large labeled datasets necessary for other types of machine learning. This is critical for domains with little NLP developments, like nursing or allied health professions.
引用
收藏
页数:8
相关论文
共 26 条
  • [1] Al Assad W, 2017, IEEE INT C BIOINFORM, P305, DOI 10.1109/BIBM.2017.8217668
  • [2] [Anonymous], 2016, QUICK STATS
  • [3] [Anonymous], BENEFITS WORD EMBEDD
  • [4] [Anonymous], INT J NURS STUD
  • [5] A simple algorithm for identifying negated findings and diseases in discharge summaries
    Chapman, WW
    Bridewell, W
    Hanbury, P
    Cooper, GF
    Buchanan, BG
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) : 301 - 310
  • [6] Chiu B., 2016, Proceedings of the 15th workshop on biomedical natural language processing, P166, DOI [DOI 10.18653/V1/W16-2922, 10.18653/v1/W16-2922,eprint:https://aclanthology.org/W16-2922.pdf]
  • [7] Elkan Charles, 2008, Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningKDD 08, P213
  • [8] Electronic medical record phenotyping using the anchor and learn framework
    Halpern, Yoni
    Horng, Steven
    Choi, Youngduck
    Sontag, David
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (04) : 731 - 740
  • [9] Jiang J., 2012, Mining text data, P11, DOI DOI 10.1007/978-1-4614-3223-42
  • [10] Levy O, 2014, ADV NEUR IN, V27