The Yale cTAKES extensions for document classification: architecture and application

被引:61
作者
Garla, Vijay [1 ]
Lo Re, Vincent, III [2 ]
Dorey-Stein, Zachariah [2 ]
Kidwai, Farah [3 ]
Scotch, Matthew [3 ,4 ]
Womack, Julie [3 ,5 ]
Justice, Amy [3 ,6 ]
Brandt, Cynthia [3 ,7 ]
机构
[1] Yale Univ, Ctr Med Informat, Interdept Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[2] Univ Penn, Sch Med, Ctr Clin Epidemiol & Biostat, Philadelphia, PA 19104 USA
[3] Connecticut VA Healthcare Syst, West Haven, CT USA
[4] Arizona State Univ, Dept Biomed Informat, Tempe, AZ USA
[5] Yale Univ, Sch Nursing, New Haven, CT 06520 USA
[6] Yale Univ, Sch Med, Gen Internal Med, New Haven, CT 06520 USA
[7] Yale Univ, Ctr Med Informat, Sch Med, New Haven, CT 06520 USA
关键词
SYSTEM; INFORMATION; EXTRACTION; RETRIEVAL; KNOWLEDGE; SUPPORT;
D O I
10.1136/amiajnl-2011-000093
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges. Methods The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation. Results and discussion The F-1-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at http://code.google.com/p/ytex.
引用
收藏
页码:614 / 620
页数:7
相关论文
共 40 条
[1]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[2]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[3]   Ten commandments for effective clinical decision support: Making the practice of evidence-based medicine a reality [J].
Bates, DW ;
Kuperman, GJ ;
Wang, S ;
Gandhi, T ;
Kittler, A ;
Volk, L ;
Spurr, C ;
Khorasani, R ;
Tanasijevic, M ;
Middleton, B .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2003, 10 (06) :523-530
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[6]   Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model [J].
Coden, Anni ;
Savova, Guergana ;
Sominsky, Igor ;
Tanenblatt, Michael ;
Masanz, James ;
Schuler, Karin ;
Cooper, James ;
Guan, Wei ;
de Groen, Piet C. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) :937-949
[7]   Classifying disease outbreak reports using n-grams and semantic features [J].
Conway, Mike ;
Doan, Son ;
Kawazoe, Ai ;
Collier, Nigel .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2009, 78 (12) :E47-E58
[8]   caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research [J].
Crowley, Rebecca S. ;
Castine, Melissa ;
Mitchell, Kevin ;
Chavan, Girish ;
McSherry, Tara ;
Feldman, Michael .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :253-264
[9]  
Cunningham H, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P168
[10]   Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC) [J].
D'Avolio, Leonard W. ;
Nguyen, Thien M. ;
Farwell, Wildon R. ;
Chen, Yongming ;
Fitzmeyer, Felicia ;
Harris, Owen M. ;
Fiore, Louis D. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (04) :375-382