Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction

被引:8
作者
Scuba, William [1 ]
Tharp, Melissa [1 ]
Mowery, Danielle [1 ]
Tseytlin, Eugene [2 ]
Liu, Yang [3 ]
Drews, Frank A. [4 ]
Chapman, Wendy W. [1 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84108 USA
[2] Univ Pittsburgh, Dept Biomed Informat, Pittsburgh, PA 15206 USA
[3] Univ Calif San Diego, San Diego, CA 92093 USA
[4] Univ Utah, Dept Psychol, Salt Lake City, UT 84108 USA
关键词
Natural Language Processing; Information extraction; Semantics; Knowledge representation; Unified Medical Language System; ELECTRONIC HEALTH RECORDS; SYSTEM; TEXT;
D O I
10.1186/s13326-016-0086-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text. Results: Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76 %) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86 %) and varied recall for modifiers (certainty: 91 % sidedness: 80 %, neurovascular anatomy: 46 %). Conclusion: Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts.
引用
收藏
页数:11
相关论文
共 28 条
[1]   Natural language processing for the development of a clinical registry: a validation study in intraductal papillary mucinous neoplasms [J].
Al-Haddad, Mohammad A. ;
Friedlin, Jeff ;
Kesterson, Joe ;
Waters, Joshua A. ;
Aguilar-Saavedra, Juan R. ;
Schmidt, C. Max .
HPB, 2010, 12 (10) :688-695
[2]   Extraction of Adverse Drug Effects from Clinical Records [J].
Aramaki, Eiji ;
Miura, Yasuhide ;
Tonoike, Masatsugu ;
Ohkuma, Tomoko ;
Masuichi, Hiroshi ;
Waki, Kayo ;
Ohe, Kazuhiko .
MEDINFO 2010, PTS I AND II, 2010, 160 :739-743
[3]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[4]   Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm [J].
Chapman, Brian E. ;
Lee, Sean ;
Kang, Hyunseok Peter ;
Chapman, Wendy W. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) :728-737
[5]   Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE) [J].
Chiang, Jung-Hsien ;
Lin, Jou-Wei ;
Yang, Chen-Wei .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :245-252
[6]  
Dean M., SWRL SEMANTIC WEB RU
[7]  
Friedman C, 2000, J AM MED INFORM ASSN, P270
[8]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174
[9]  
Hearst M.A., 1999, P ASS COMPUTATIONAL, P3, DOI DOI 10.3115/1034678.1034679
[10]  
Hospedales T.M., arXiv preprint arXiv:1710.03463