Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

被引:21
|
作者
Ferraro, Jeffrey P. [1 ,2 ]
Daume, Hal, III [3 ]
DuVall, Scott L. [4 ,5 ]
Chapman, Wendy W. [6 ]
Harkema, Henk [7 ]
Haug, Peter J. [1 ,2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT USA
[2] Intermt Healthcare, Homer Warner Ctr Informat Res, Salt Lake City, UT USA
[3] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[4] Univ Utah, Dept Internal Med, Salt Lake City, UT USA
[5] VA Salt Lake City Healthcare Syst, Salt Lake City, UT USA
[6] Univ Calif San Diego, Dept Biomed Informat, La Jolla, CA 92093 USA
[7] Nuance Commun, Pittsburgh, PA USA
关键词
Natural Language Processing; NLP; POS Tagging; Domain Adaptation; Clinical Narratives; SAMPLE SELECTION; SYSTEM; TEXT; CORPUS; NLP;
D O I
10.1136/amiajnl-2012-001453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives. Methods Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt. Results The evaluated POS taggers drop in accuracy by 8.5-15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3-91.0% on clinical texts. ClinAdapt reports 93.2-93.9%. Conclusions ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.
引用
收藏
页码:931 / 939
页数:9
相关论文
共 50 条
  • [41] Sentence-Level Semantic Features Guided Adversarial Network for Zhuang Language Part-of-Speech Tagging
    Li, Zhixin
    Sun, Yaru
    Tang, Suqin
    Zhang, Canlong
    Ma, Huifang
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 265 - 272
  • [42] A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
    Ying Xiong
    Zhongmin Wang
    Dehuan Jiang
    Xiaolong Wang
    Qingcai Chen
    Hua Xu
    Jun Yan
    Buzhou Tang
    BMC Medical Informatics and Decision Making, 19
  • [43] A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
    Xiong, Ying
    Wang, Zhongmin
    Jiang, Dehuan
    Wang, Xiaolong
    Chen, Qingcai
    Xu, Hua
    Yan, Jun
    Tang, Buzhou
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 2)
  • [44] Tagging L2 Writing: Learner Errors and the Performance of an Automated Part-of-Speech Tagger
    Aziz, Roslina Abdul
    Don, Zuraidah Mohd
    GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2019, 19 (03): : 140 - 155
  • [45] Building Codes Part-of-Speech Tagging Performance Improvement by Error-Driven Transformational Rules
    Xue, Xiaorui
    Zhang, Jiansong
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2020, 34 (05)
  • [46] N-gram Adaptation Using Dirichlet Class Language Model Based on Part-of-Speech for Speech Recognition
    Hatami, Ali
    Akbari, Ahmad
    Nasersharif, Babak
    2013 21ST IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2013,
  • [47] A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages
    Scherrer, Yves
    Sagot, Benoit
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 502 - 508
  • [48] Domain-specific Chinese Transformer-XL Language Model with Part-of-speech Information
    Qu, Huaichang
    Zhao, Haifeng
    Wang, Xin
    2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 81 - 85
  • [49] Training and Evaluating a Statistical Part-of-Speech Tagger for Natural Language Applications using Kepler Workflows
    Briesch, Doug
    Hobbs, Reginald
    Jaja, Claire
    Kjersten, Brian
    Voss, Clare
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 1588 - 1594
  • [50] Language-based transfer learning approaches for part-of-speech tagging on Saint Petersburg Corpus of Hagiographic texts (SKAT)
    Gudkov, Vadim V.
    Mitrenina, Olga V.
    Sokolov, Evgenii G.
    Koval, Angelina A.
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA-YAZYK I LITERATURA, 2023, 20 (02): : 268 - 282