Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

被引:21
|
作者
Ferraro, Jeffrey P. [1 ,2 ]
Daume, Hal, III [3 ]
DuVall, Scott L. [4 ,5 ]
Chapman, Wendy W. [6 ]
Harkema, Henk [7 ]
Haug, Peter J. [1 ,2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT USA
[2] Intermt Healthcare, Homer Warner Ctr Informat Res, Salt Lake City, UT USA
[3] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
[4] Univ Utah, Dept Internal Med, Salt Lake City, UT USA
[5] VA Salt Lake City Healthcare Syst, Salt Lake City, UT USA
[6] Univ Calif San Diego, Dept Biomed Informat, La Jolla, CA 92093 USA
[7] Nuance Commun, Pittsburgh, PA USA
关键词
Natural Language Processing; NLP; POS Tagging; Domain Adaptation; Clinical Narratives; SAMPLE SELECTION; SYSTEM; TEXT; CORPUS; NLP;
D O I
10.1136/amiajnl-2012-001453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective Natural language processing (NLP) tasks are commonly decomposed into subtasks, chained together to form processing pipelines. The residual error produced in these subtasks propagates, adversely affecting the end objectives. Limited availability of annotated clinical data remains a barrier to reaching state-of-the-art operating characteristics using statistically based NLP tools in the clinical domain. Here we explore the unique linguistic constructions of clinical texts and demonstrate the loss in operating characteristics when out-of-the-box part-of-speech (POS) tagging tools are applied to the clinical domain. We test a domain adaptation approach integrating a novel lexical-generation probability rule used in a transformation-based learner to boost POS performance on clinical narratives. Methods Two target corpora from independent healthcare institutions were constructed from high frequency clinical narratives. Four leading POS taggers with their out-of-the-box models trained from general English and biomedical abstracts were evaluated against these clinical corpora. A high performing domain adaptation method, Easy Adapt, was compared to our newly proposed method ClinAdapt. Results The evaluated POS taggers drop in accuracy by 8.5-15% when tested on clinical narratives. The highest performing tagger reports an accuracy of 88.6%. Domain adaptation with Easy Adapt reports accuracies of 88.3-91.0% on clinical texts. ClinAdapt reports 93.2-93.9%. Conclusions ClinAdapt successfully boosts POS tagging performance through domain adaptation requiring a modest amount of annotated clinical data. Improving the performance of critical NLP subtasks is expected to reduce pipeline error propagation leading to better overall results on complex processing tasks.
引用
收藏
页码:931 / 939
页数:9
相关论文
共 50 条
  • [1] Domain adaptation in part-of-speech tagging
    Institute of Exact and Natural Sciences, Federal University of Pará , Pará, Brazil
    不详
    Emerging Applic. of Nat. Lang. Proc.: Concepts and New Res., (52-72):
  • [2] Novel Text Steganography Using Natural Language Processing and Part-of-Speech Tagging
    Banik, Barnali Gupta
    Bandyopadhyay, Samir Kumar
    IETE JOURNAL OF RESEARCH, 2020, 66 (03) : 384 - 395
  • [3] Natural language processing in support of decision-making: phrases and part-of-speech tagging
    Losee, RM
    INFORMATION PROCESSING & MANAGEMENT, 2001, 37 (06) : 769 - 787
  • [4] Part-of-Speech Tagging for Azerbaijani Language
    Mammadov, Samir
    Rustamov, Samir
    Mustafali, Ali
    Sadigov, Ziyaddin
    Mollayev, Rasim
    Mammadov, Zamir
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 40 - 45
  • [5] Improving Arabic Part-of-Speech Tagging through Morphological Analysis
    Albared, Mohammed
    Omar, Nazlia
    Ab Aziz, Mohd. Juzaiddin
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT I, 2011, 6591 : 317 - 326
  • [6] Natural Language Requirements Specification Analysis Using Part-of-Speech Tagging
    Fatwanto, Agung
    2013 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION TECHNOLOGY (FGCT 2013), 2013, : 98 - 102
  • [7] Part-of-Speech (POS) Tagging for the Nyishi Language
    Siram, Joyir
    Sambyo, Koj
    Sarkar, Achyuth
    ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 191 - 199
  • [8] Investigation of Viterbi Algorithm Performance on Part-of-Speech Tagger of Natural Language Processing
    Liu, Yue
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 1430 - 1433
  • [9] Domain adaptation for part-of-speech tagging of noisy user-generated text
    Maerz, Luisa
    Trautmann, Dietrich
    Roth, Benjamin
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3415 - 3420
  • [10] Domain Adaptation for Part-of-Speech Tagging of Indonesian Text Using Affix Information
    Maulana, Aditya
    Romadhony, Ade
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 640 - 647