Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes

被引:63
作者
Kehl, Kenneth L. [1 ,2 ]
Xu, Wenxin [2 ,3 ]
Lepisto, Eva [1 ,2 ]
Elmarakeby, Haitham [1 ,2 ,4 ]
Hassett, Michael J. [1 ,2 ]
Van Allen, Eliezer M. [1 ,2 ,4 ]
Johnson, Bruce E. [1 ,2 ]
Schrag, Deborah [1 ,2 ]
机构
[1] Dana Farber Canc Inst, Boston, MA 02115 USA
[2] Harvard Med Sch, Boston, MA 02115 USA
[3] Beth Israel Deaconess Med Ctr, Boston, MA 02215 USA
[4] Broad Inst, Cambridge, MA USA
关键词
TRANSLATIONAL RESEARCH; EXTRACTION;
D O I
10.1200/CCI.20.00020
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSE Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists' progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment. (c) 2020 by American Society of Clinical Oncology
引用
收藏
页码:680 / 690
页数:11
相关论文
共 28 条
[1]  
Alawad M, COARSE TO FINE MULTI
[2]   Opening the black box of machine learning [J].
不详 .
LANCET RESPIRATORY MEDICINE, 2018, 6 (11) :801-801
[3]  
[Anonymous], Keras documentation: Simple mnist convnet
[4]   The Evolving Uses of "Real-World" Data [J].
Basch, Ethan ;
Schrag, Deborah .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2019, 321 (14) :1359-1360
[5]   Symptom Monitoring With Patient-Reported Outcomes During Routine Cancer Treatment: A Randomized Controlled Trial [J].
Basch, Ethan ;
Deal, Allison M. ;
Kris, Mark G. ;
Scher, Howard I. ;
Hudis, Clifford A. ;
Sabbatini, Paul ;
Rogak, Lauren ;
Bennett, Antonia V. ;
Dueck, Amylou C. ;
Atkinson, Thomas M. ;
Chou, Joanne F. ;
Dulko, Dorothy ;
Sit, Laura ;
Barz, Allison ;
Novotny, Paul ;
Fruscione, Michael ;
Sloan, Jeff A. ;
Schrag, Deborah .
JOURNAL OF CLINICAL ONCOLOGY, 2016, 34 (06) :557-+
[6]  
Brevdo E., 2016, TENSOR
[7]   Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence [J].
Carrell, David S. ;
Halgrim, Scott ;
Diem-Thy Tran ;
Buist, Diana S. M. ;
Chubak, Jessica ;
Chapman, Wendy W. ;
Savova, Guergana .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 179 (06) :749-758
[8]   Automated annotation and classification of BI-RADS assessment from radiology reports [J].
Castro, Sergio M. ;
Tseytlin, Eugene ;
Medvedeva, Olga ;
Mitchell, Kevin ;
Visweswaran, Shyam ;
Bekhuis, Tanja ;
Jacobson, Rebecca S. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 69 :177-187
[9]  
Center for Drug Evaluation and Research US Food and Drug Administration, SUBM DOC US REAL WOR
[10]   Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports [J].
Chen, Po-Hao ;
Zafar, Hanna ;
Galperin-Aizenberg, Maya ;
Cook, Tessa .
JOURNAL OF DIGITAL IMAGING, 2018, 31 (02) :178-184