Automatic inference of BI-RADS final assessment categories from narrative mammography report findings

被引：17

作者：

Banerjee, Imon ^{[1
]}

Bozkurt, Selen ^{[1
,2
]}

Alkim, Emel ^{[1
]}

Sagreiya, Hersh ^{[3
]}

Kurian, Allison W. ^{[4
]}

Rubin, Daniel L. ^{[1
,3
]}

机构：

[1] Stanford Univ, Sch Med, Dept Biomed Data Sci, Stanford, CA 94305 USA

[2] Akdeniz Univ, Fac Med, Dept Biostat & Med Informat, TR-07059 Antalya, Turkey

[3] Stanford Univ, Sch Med, Dept Radiol, Stanford, CA 94305 USA

[4] Stanford Univ, Med Oncol & Hlth Res & Policy, Sch Med, Stanford, CA 94305 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2019年 / 92卷

基金：

美国国家卫生研究院;

关键词：

BI-RADS classification; Deep learning; Mammography report; NLP; Distributional semantics; Text mining; DATA SYSTEM; CLASSIFICATION; VARIABILITY;

D O I：

10.1016/j.jbi.2019.103137

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We propose an efficient natural language processing approach for inferring the BI-RADS final assessment categories by analyzing only the mammogram findings reported by the mammographer in narrative form. The proposed hybrid method integrates semantic term embedding with distributional semantics, producing a context-aware vector representation of unstructured mammography reports. A large corpus of unannotated mammography reports (300,000) was used to learn the context of the key-terms using a distributional semantics approach, and the trained model was applied to generate context-aware vector representations of the reports annotated with BI-RADS category (22,091). The vectorized reports were utilized to train a supervised classifier to derive the BI-RADS assessment class. Even though the majority of the proposed embedding pipeline is unsupervised, the classifier was able to recognize substantial semantic information for deriving the BI-BADS categorization not only on a holdout internal testset and also on an external validation set (1900 reports). Our proposed method outperforms a recently published domain-specific rule-based system and could be relevant for evaluating concordance between radiologists. With minimal requirement for task specific customization, the proposed method can be easily transferable to a different domain to support large scale text mining or derivation of patient phenotype.

引用

页数：11

共 32 条

[1]

[Anonymous], ILLUSTRATED BREAST I

[2] Breast cancer surveillance consortium: A national mammography screening and outcomes database [J].

BallardBarbash, R ;

Taplin, SH ;

Yankaskas, BC ;

Ernster, VL ;

Rosenberg, RD ;

Carney, PA ;

Barlow, WE ;

Geller, BM ;

Kerlikowske, K ;

Edwards, BK ;

Lynch, CF ;

Urban, N ;

Key, CR ;

Poplack, SP ;

Worden, JK ;

Kessler, LG .

AMERICAN JOURNAL OF ROENTGENOLOGY, 1997, 169 (04) :1001-1008

[3]

Banerjee I., ARXIV171106968

[4] Breast imaging reporting and data system: Inter- and intraobserver variability in feature analysis and final assessment [J].

Berg, WA ;

Campassi, C ;

Langenberg, P ;

Sexton, MJ .

AMERICAN JOURNAL OF ROENTGENOLOGY, 2000, 174 (06) :1769-1777

[5]

Bird Steven., 2004, P ACL INT POST DEM S, P214

[6]

Bouma Gerlof, 2009, German Society for Computational Linguistics and Language Technology (GSCL) Conference, P31

[7] Variability and errors when applying the BIRADS mammography classification [J].

Boyer, Bruno ;

Canale, Sandra ;

Arfi-Rouche, Julia ;

Monzani, Quentin ;

Khaled, Wassef ;

Balleyguier, Corinne .

EUROPEAN JOURNAL OF RADIOLOGY, 2013, 82 (03) :388-397

[8] Using automatically extracted information from mammography reports for decision-support [J].

Bozkurt, Selen ;

Gimenez, Francisco ;

Burnside, Elizabeth S. ;

Gulkesen, Kemal H. ;

Rubin, Daniel L. .

JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 62 :224-231

[9] Automatic abstraction of imaging observations with their characteristics from mammography reports [J].

Bozkurt, Selen ;

Lipson, Jafi A. ;

Senol, Utku ;

Rubin, Daniel L. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2015, 22 (E1) :E81-U246

[10] Automated Detection of Ambiguity in BI-RADS Assessment Categories in Mammography Reports [J].

Bozkurt, Selen ;

Rubin, Daniel .

CROSS-BORDER CHALLENGES IN INFORMATICS WITH A FOCUS ON DISEASE SURVEILLANCE AND UTILISING BIG DATA, 2014, 197 :35-39

← 1 2 3 4 →