Using natural language processing to extract mammographic findings

被引:25
作者
Gao, Hongyuan [1 ]
Bowles, Erin J. Aiello [1 ]
Carrell, David [1 ]
Buist, Diana S. M. [1 ]
机构
[1] Grp Hlth Res Inst, Seattle, WA 98101 USA
关键词
Natural language processing; SAS-based; Evaluation; Mammographic findings; CLASSIFICATION; ALGORITHM;
D O I
10.1016/j.jbi.2015.01.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Structured data on mammographic findings are difficult to obtain without manual review. We developed and evaluated a rule-based natural language processing (NLP) system to extract mammographic findings from free-text mammography reports. Materials and Methods: The NLP system extracted four mammographic findings: mass, calcification, asymmetry, and architectural distortion, using a dictionary look-up method on 93,705 mammography reports from Group Health. Status annotations and anatomical location annotation were associated to each NLP detected finding through association rules. After excluding negated, uncertain, and historical findings, affirmative mentions of detected findings were summarized. Confidence flags were developed to denote reports with highly confident NLP results and reports with possible NLP errors. A random sample of 100 reports was manually abstracted to evaluate the accuracy of the system. Results: The NLP system correctly coded 96-99 out of our sample of 100 reports depending on findings. Measures of sensitivity, specificity and negative predictive values exceeded 0.92 for all findings. Positive predictive values were relatively low for some findings due to their low prevalence. Discussion: Our NLP system was implemented entirely in SAS Base, which makes it portable and easy to implement. It performed reasonably well with multiple applications, such as using confidence flags as a filter to improve the efficiency of manual review. Refinements of library and association rules, and testing on more diverse samples may further improve its performance. Conclusion: Our NLP system successfully extracts clinically useful information from mammography reports. Moreover, SAS is a feasible platform for implementing NLP algorithms. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:77 / 84
页数:8
相关论文
共 15 条
[1]  
American College of Radiology, 2013, ACR BI RADS ATLAS MA
[2]  
American College of Radiology (ACR) ACR BI-RADS - mammography, 2003, ACR BREAST IMAGING A
[3]   Annotation for Information Extraction from Mammography Reports [J].
Bozkurt, Selen ;
Gulkesen, Kemal Hakan ;
Rubin, Daniel .
INFORMATICS, MANAGEMENT AND TECHNOLOGY IN HEALTHCARE, 2013, 190 :183-185
[4]   Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence [J].
Carrell, David S. ;
Halgrim, Scott ;
Diem-Thy Tran ;
Buist, Diana S. M. ;
Chubak, Jessica ;
Chapman, Wendy W. ;
Savova, Guergana .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 179 (06) :749-758
[5]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[6]   ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports [J].
Harkema, Henk ;
Dowling, John N. ;
Thornblade, Tyler ;
Chapman, Wendy W. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) :839-851
[7]  
Jain NL, 1997, J AM MED INFORM ASSN, P829
[8]   Natural Language Processing for Radiology (Part 2) [J].
Lacson, Ronilda ;
Khorasani, Ramin .
JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2011, 8 (08) :583-584
[9]  
Moore CR, 2014, J PATIENT SAF POST A
[10]   Automatic classification of mammography reports by BI-RADS breast tissue composition class [J].
Percha, Bethany ;
Nassif, Houssam ;
Lipson, Jafi ;
Burnside, Elizabeth ;
Rubin, Daniel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (05) :913-916