Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records

被引:24
|
作者
Zeng, Jiaming [1 ]
Banerjee, Imon [2 ]
Henry, A. Solomon [3 ]
Wood, Douglas J. [3 ]
Shachter, Ross D. [4 ]
Gensheimer, Michael F. [5 ]
Rubin, Daniel L. [6 ]
机构
[1] Huang Engn Ctr, Dept Management Sci & Engn, 338J,475 Via Ortega, Stanford, CA 94305 USA
[2] Emory Univ, Dept Biomed Informat, Dept Radiol, Sch Med, Atlanta, GA USA
[3] Stanford Univ, Res Informat Ctr, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Management Sci & Engn, Sch Engn, Stanford, CA USA
[5] Stanford Univ, Dept Radiat Oncol, Sch Med, Stanford, CA USA
[6] Stanford Univ, Dept Biomed Data Sci Radiol & Med Biomed Informat, Sch Med, Stanford, CA USA
来源
JCO CLINICAL CANCER INFORMATICS | 2021年 / 5卷
关键词
REGISTRY DATA; INFORMATION; ACCURACY; TEXT; EXTRACTION; ONCOLOGY; OUTCOMES; DISEASE; QUALITY; STAGE;
D O I
10.1200/CCI.20.00173
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSE Knowing the treatments administered to patients with cancer is important for treatment planning and correlating treatment patterns with outcomes for personalized medicine study. However, existing methods to identify treatments are often lacking. We develop a natural language processing approach with structured electronic medical records and unstructured clinical notes to identify the initial treatment administered to patients with cancer. METHODS We used a total number of 4,412 patients with 483,782 clinical notes from the Stanford Cancer Institute Research Database containing patients with nonmetastatic prostate, oropharynx, and esophagus cancer. We trained treatment identification models for each cancer type separately and compared performance of using only structured, only unstructured (bag-of-words, doc2vec, fasttext), and combinations of both (structured + bow, structured + doc2vec, structured + fasttext). We optimized the identification model among five machine learning methods (logistic regression, multilayer perceptrons, random forest, support vector machines, and stochastic gradient boosting). The treatment information recorded in the cancer registry is the gold standard and compares our methods to an identification baseline with billing codes. RESULTS For prostate cancer, we achieved an f1-score of 0.99 (95% CI, 0.97 to 1.00) for radiation and 1.00 (95% CI, 0.99 to 1.00) for surgery using structured + doc2vec. For oropharynx cancer, we achieved an f1-score of 0.78 (95% CI, 0.58 to 0.93) for chemoradiation and 0.83 (95% CI, 0.69 to 0.95) for surgery using doc2vec. For esophagus cancer, we achieved an f1-score of 1.0 (95% CI, 1.0 to 1.0) for both chemoradiation and surgery using all combinations of structured and unstructured data. We found that employing the free-text clinical notes outperforms using the billing codes or only structured data for all three cancer types. CONCLUSION Our results show that treatment identification using free-text clinical notes greatly improves upon the performance using billing codes and simple structured data. The approach can be used for treatment cohort identification and adapted for longitudinal cancer treatment identification. (C) 2021 by American Society of Clinical Oncology
引用
收藏
页码:379 / 393
页数:15
相关论文
共 50 条
  • [1] Natural Language Processing and Electronic Medical Records Reply
    Murff, Harvey J.
    FitzHenry, Fern
    Speroff, Theodore
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2011, 306 (21): : 2325 - 2326
  • [2] IDENTIFY PATIENTS WITH PYRUVATE KINASE DEFICIENCY USING NATURAL LANGUAGE PROCESSING ON ELECTRONIC MEDICAL RECORDS
    Liu, S.
    Shi, L.
    Lin, Y.
    Zhang, Y.
    Hong, D.
    Shao, Y.
    VALUE IN HEALTH, 2020, 23 : S329 - S329
  • [3] Development and Validation of an Algorithm to Identify Prostate Cancer Related Mortality in Electronic Medical Records Using Natural Language Processing
    DiBello, Julia R.
    Wallner, Lauren P.
    Zheng, Chengyi
    Yu, Wei
    Li, Bonnie H.
    VanDenEeden, Stephen K.
    Weinmann, Sheila
    Ritzwoller, Debra
    Richert-Boe, Kathryn
    Jacobsen, Stephen J.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2015, 24 : 418 - 419
  • [4] RUBY: Natural Language Processing of French Electronic Medical Records for Breast Cancer Research
    Schiappa, Renaud
    Contu, Sara
    Culie, Dorian
    Thamphya, Brice
    Chateau, Yann
    Gal, Jocelyn
    Bailleux, Caroline
    Haudebourg, Juliette
    Ferrero, Jean-Marc
    Barranger, Emmanuel
    Chamorey, Emmanuel
    JCO CLINICAL CANCER INFORMATICS, 2022, 6 : e2100199
  • [5] Natural Language Processing to Identify Lupus Nephritis Phenotype in Electronic Health Records
    Deng, Yu
    Pacheco, Jennifer
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Weng, Chunhua
    Liu, Cong
    Gordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    ARTHRITIS & RHEUMATOLOGY, 2021, 73 : 666 - 667
  • [6] Natural language processing to identify lupus nephritis phenotype in electronic health records
    Deng, Yu
    Pacheco, Jennifer A.
    Ghosh, Anika
    Chung, Anh
    Mao, Chengsheng
    Smith, Joshua C.
    Zhao, Juan
    Wei, Wei-Qi
    Barnado, April
    Dorn, Chad
    Weng, Chunhua
    Liu, Cong
    Cordon, Adam
    Yu, Jingzhi
    Tedla, Yacob
    Kho, Abel
    Ramsey-Goldman, Rosalind
    Walunas, Theresa
    Luo, Yuan
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 22 (SUPPL 2)
  • [7] Natural language processing to identify lupus nephritis phenotype in electronic health records
    Yu Deng
    Jennifer A. Pacheco
    Anika Ghosh
    Anh Chung
    Chengsheng Mao
    Joshua C. Smith
    Juan Zhao
    Wei-Qi Wei
    April Barnado
    Chad Dorn
    Chunhua Weng
    Cong Liu
    Adam Cordon
    Jingzhi Yu
    Yacob Tedla
    Abel Kho
    Rosalind Ramsey-Goldman
    Theresa Walunas
    Yuan Luo
    BMC Medical Informatics and Decision Making, 22
  • [8] A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records
    Annapragada, Akshaya, V
    Donaruma-Kwoh, Marcella M.
    Annapragada, Ananth, V
    Starosolski, Zbigniew A.
    PLOS ONE, 2021, 16 (02):
  • [9] Can Natural Language Processing Fulfill the Promise of Electronic Medical Records?
    Heidenreich, Paul A.
    JOURNAL OF CARDIAC FAILURE, 2014, 20 (07) : 465 - 466
  • [10] Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records
    Savova, Guergana K.
    Danciu, Ioana
    Alamudun, Folami
    Miller, Timothy
    Lin, Chen
    Bitterman, Danielle S.
    Tourassi, Georgia
    Warner, Jeremy L.
    CANCER RESEARCH, 2019, 79 (21) : 5463 - 5470