Development and validation of a pancreatic cancer risk model for the general population using electronic health records: An observational study

被引:36
作者
Appelbaum, Limor [1 ]
Cambronero, Jose P. [2 ]
Stevens, Jennifer P. [3 ]
Horng, Steven [4 ]
Pollick, Karla [3 ]
Silva, George [3 ]
Haneuse, Sebastien [5 ]
Piatkowski, Gail [3 ]
Benhaga, Nordine [1 ]
Duey, Stacey [6 ]
Stevenson, Mary A. [1 ]
Mamon, Harvey [7 ]
Kaplan, Irving D. [1 ]
Rinard, Martin C. [2 ]
机构
[1] Beth Israel Deaconess Med Ctr, Dept Radiat Oncol, 330 Brookline Ave, Boston, MA 02215 USA
[2] MIT, Comp Sci & Artificial Intelligence Lab, 32 Vassar St, Cambridge, MA 02139 USA
[3] Beth Israel Deaconess Med Ctr, Ctr Healthcare Delivery Sci, 330 Brookline Ave, Boston, MA 02215 USA
[4] Beth Israel Deaconess Med Ctr, Div Emergency Med Informat, 330 Brookline Ave, Boston, MA 02215 USA
[5] Harvard Univ, TH Chan Sch Publ Hlth, 677 Huntington Ave, Boston, MA 02115 USA
[6] Brigham & Womens Hosp, Partners Res IS & Comp, Dept Informat Syst, 75 Francis St, Boston, MA 02115 USA
[7] Harvard Med Sch, Dana Farber Canc Inst Radiat Oncol, Brigham & Womens Hosp, 75 Francis St, Boston, MA 02115 USA
关键词
Pancreatic carcinoma; Adenocarcinoma; Electronic health records; Logistic regression models; AUC; DUCTAL ADENOCARCINOMA; PREDICTION MODEL; EARLY-DIAGNOSIS;
D O I
10.1016/j.ejca.2020.10.019
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Aim: Pancreatic ductal adenocarcinoma (PDAC) is often diagnosed at a late, incurable stage. We sought to determine whether individuals at high risk of developing PDAC could be identified early using routinely collected data. Methods: Electronic health record (EHR) databases from two independent hospitals in Boston, Massachusetts, providing inpatient, outpatient, and emergency care, from 1979 through 2017, were used with case-control matching. PDAC cases were selected using International Classification of Diseases 9/10 codes and validated with tumour registries. A data-driven feature selection approach was used to develop neural networks and L2-regularised logistic regression (LR) models on training data (594 cases, 100,787 controls) and compared with a published model based on hand-selected diagnoses ('baseline'). Model performance was validated on an external database (408 cases, 160,185 controls). Three prediction lead times (180, 270 and 365 days) were considered. Results: The LR model had the best performance, with an area under the curve (AUC) of 0.71 (confidence interval [CI]: 0.67-0.76) for the training set, and AUC 0.68 (CI: 0.65-0.71) for the validation set, 365 days before diagnosis. Data-driven feature selection improved results over 'baseline' (AUC = 0.55; CI: 0.52-0.58). The LR model flags 2692 (CI 2592-2791) of 156,485 as high risk, 365 days in advance, identifying 25 (CI: 16-36) cancer patients. Risk stratification showed that the high-risk group presented a cancer rate 3 to 5 times the prevalence in our data set. Conclusion: A simple EHR model, based on diagnoses, can identify high-risk individuals for PDAC up to one year in advance. This inexpensive, systematic approach may serve as the first sieve for selection of individuals for PDAC screening programs. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页码:19 / 30
页数:12
相关论文
共 37 条
[1]   Prediction of acute myeloid leukaemia risk in healthy individuals [J].
Abelson, Sagi ;
Collord, Grace ;
Ng, Stanley W. K. ;
Weissbrod, Omer ;
Cohen, Netta Mendelson ;
Niemeyer, Elisabeth ;
Barda, Noam ;
Zuzarte, Philip C. ;
Heisler, Lawrence ;
Sundaravadanam, Yogi ;
Luben, Robert ;
Hayat, Shabina ;
Wang, Ting Ting ;
Zhao, Zhen ;
Cirlan, Julia ;
Pugh, Trevor J. ;
Soave, David ;
Ng, Karen ;
Latimer, Calli ;
Hardy, Claire ;
Raine, Keiran ;
Jones, David ;
Hoult, Diana ;
Britten, Abigail ;
McPherson, John D. ;
Johansson, Mattias ;
Mbabaali, Faridah ;
Eagles, Jenna ;
Millers, Jessica K. ;
Pasternack, Danielle ;
Timms, Lee ;
Krzyzanowski, Paul ;
Awadalla, Philip ;
Costa, Rui ;
Segal, Eran ;
Bratman, Scott, V ;
Beer, Philip ;
Behjati, Sam ;
Martincorena, Inigo ;
Wang, Jean C. Y. ;
Bowles, Kristian M. ;
Ramon Quiros, J. ;
Karakatsani, Anna ;
La Vecchia, Carlo ;
Trichopoulou, Antonia ;
Salamanca-Fernandez, Elena ;
Huerta, Jose M. ;
Barricarte, Aurelio ;
Travis, Ruth C. ;
Tumino, Rosario .
NATURE, 2018, 559 (7714) :400-+
[2]   Do changes in health reveal the possibility of undiagnosed pancreatic cancer? Development of a risk-prediction model based on healthcare claims data [J].
Baecker, Aileen ;
Kim, Sungjin ;
Risch, Harvey A. ;
Nuckols, Teryl K. ;
Wu, Bechien U. ;
Hendifar, Andrew E. ;
Pandol, Stephen J. ;
Pisegna, Joseph R. ;
Jeon, Christie Y. .
PLOS ONE, 2019, 14 (06)
[3]   Predicting Suicidal Behavior From Longitudinal Electronic Health Records [J].
Barak-Corren, Yuval ;
Castro, Victor M. ;
Javitt, Solomon ;
Hoffnagle, Alison G. ;
Dai, Yael ;
Perlis, Roy H. ;
Nock, Matthew K. ;
Smoller, Jordan W. ;
Reis, Ben Y. .
AMERICAN JOURNAL OF PSYCHIATRY, 2017, 174 (02) :154-162
[4]   A Clinical Prediction Model to Assess Risk for Pancreatic Cancer Among Patients With New-Onset Diabetes [J].
Boursi, Ben ;
Finkelman, Brian ;
Giantonio, Bruce J. ;
Haynes, Kevin ;
Rustgi, Anil K. ;
Rhim, Andrew D. ;
Mamtani, Ronac ;
Yang, Yu-Xiao .
GASTROENTEROLOGY, 2017, 152 (04) :840-+
[5]   Risk of Neoplastic Progression in Individuals at High Risk for Pancreatic Cancer Undergoing Long-term Surveillance [J].
Canto, Marcia Irene ;
Almario, Jose Alejandro ;
Schulick, Richard D. ;
Yeo, Charles J. ;
Klein, Alison ;
Blackford, Amanda ;
Shin, Eun Ji ;
Sanyal, Abanti ;
Yenokyan, Gayane ;
Lennon, Anne Marie ;
Kamel, Ihab R. ;
Fishman, Elliot K. ;
Wolfgang, Christopher ;
Weiss, Matthew ;
Hruban, Ralph H. ;
Goggins, Michael .
GASTROENTEROLOGY, 2018, 155 (03) :740-+
[6]   International Cancer of the Pancreas Screening (CAPS) Consortium summit on the management of patients with increased risk for familial pancreatic cancer [J].
Canto, Marcia Irene ;
Harinck, Femme ;
Hruban, Ralph H. ;
Offerhaus, George Johan ;
Poley, Jan-Werner ;
Kamel, Ihab ;
Nio, Yung ;
Schulick, Richard S. ;
Bassi, Claudio ;
Kluijt, Irma ;
Levy, Michael J. ;
Chak, Amitabh ;
Fockens, Paul ;
Goggins, Michael ;
Bruno, Marco .
GUT, 2013, 62 (03) :339-347
[7]   "It can't be very important because it comes and goes"-patients' accounts of intermittent symptoms preceding a pancreatic cancer diagnosis: a qualitative study [J].
Evans, Julie ;
Chapple, Alison ;
Salisbury, Helen ;
Corrie, Pippa ;
Ziebland, Sue .
BMJ OPEN, 2014, 4 (02)
[8]   CLAMS: Bringing Quality to Data Lakes [J].
Farid, Mina ;
Roatis, Alexandra ;
Ilyas, Ihab F. ;
Hoffmann, Hella-Franziska ;
Chu, Xu .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :2089-2092
[9]   Automated Survival Prediction in Metastatic Cancer Patients Using High-Dimensional Electronic Medical Record Data [J].
Gensheimer, Michael F. ;
Henry, A. Solomon ;
Wood, Douglas J. ;
Hastie, Trevor J. ;
Aggarwal, Sonya ;
Dudley, Sara A. ;
Pradhan, Pooja ;
Banerjee, Imon ;
Cho, Eunpi ;
Ramchandran, Kavitha ;
Pollom, Erqi ;
Koong, Albert C. ;
Rubin, Daniel L. ;
Chang, Daniel T. .
JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2019, 111 (06) :568-574
[10]   Development and validation of a Hospital Frailty Risk Score focusing on older people in acute care settings using electronic hospital records: an observational study [J].
Gilbert, Thomas ;
Neuburger, Jenny ;
Kraindler, Joshua ;
Keeble, Eilis ;
Smith, Paul ;
Ariti, Cono ;
Arora, Sandeepa ;
Street, Andrew ;
Parker, Stuart ;
Roberts, Helen C. ;
Bardsley, Martin ;
Conroy, Simon .
LANCET, 2018, 391 (10132) :1775-1782