FLEXIBLE RISK PREDICTION MODELS FOR LEFT OR INTERVAL-CENSORED DATA FROM ELECTRONIC HEALTH RECORDS

被引:17
作者
Hyun, Noorie [1 ]
Cheung, Li C. [1 ]
Pan, Qing [2 ]
Schiffman, Mark [1 ]
Katki, Hormuzd A. [1 ]
机构
[1] NCI, Div Canc Epidemiol & Genet, Rockville, MD 20850 USA
[2] George Washington Univ, Dept Stat, Washington, DC 20052 USA
关键词
Mixture model; interval censoring; two-phase sampling; B-splines; weighted likelihood; HIV; MAXIMUM-LIKELIHOOD-ESTIMATION; FAILURE TIME MODEL; HUMAN-PAPILLOMAVIRUS; CERVICAL-CANCER; MANAGEMENT; REGRESSION; HPV; GUIDELINES; INFERENCE; WOMEN;
D O I
10.1214/17-AOAS1036
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Electronic health records are a large and cost-effective data source for developing risk-prediction models. However, for screen-detected diseases, standard risk models (such as Kaplan-Meier or Cox models) do not account for key issues encountered with electronic health record data: left-censoring of pre-existing (prevalent) disease, interval-censoring of incident disease, and ambiguity of whether disease is prevalent or incident when definitive disease ascertainment is not conducted at baseline. Furthermore, researchers might conduct novel screening tests only on a complex two-phase subsample. We propose a family of weighted mixture models that account for left/intervalcensoring and complex sampling via inverse-probability weighting in order to estimate current and future absolute risk: we propose a weakly-parametric model for general use and a semiparametric model for checking goodness of fit of the weakly-parametric model. We demonstrate asymptotic properties analytically and by simulation. We used electronic health records to assemble a cohort of 33,295 human papillomavirus (HPV) positive women undergoing cervical cancer screening at Kaiser Permanente Northern California (KPNC) that underlie current screening guidelines. The next guidelines would focus on HPV typing tests, but reporting 14 HPV types is too complex for clinical use. National Cancer Institute along with KPNC conducted a HPV typing test on a complex subsample of 9258 women in the cohort. We used our model to estimate the risk due to each type and grouped the 14 types (the 3-year risk ranges 21.9-1.5) into 4 risk-bands to simplify reporting to clinicians and guidelines. These risk-bands could be adopted by future HPV typing tests and future screening guidelines.
引用
收藏
页码:1063 / 1084
页数:22
相关论文
共 35 条
[1]   Weighted likelihood for semiparametric models and two-phase stratified samples, with application to cox regression [J].
Breslow, Norman E. ;
Wellner, Jon A. .
SCANDINAVIAN JOURNAL OF STATISTICS, 2007, 34 (01) :86-102
[2]  
Breslow NE, 2009, STAT BIOSCI, V1, P32, DOI 10.1007/s12561-009-9001-6
[3]   Resampling Procedures for Making Inference Under Nested Case-Control Studies [J].
Cai, Tianxi ;
Zheng, Yingye .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (504) :1532-1544
[4]   Performance of carcinogenic human papillomavirus (HPV) testing and HPV16 or HPV18 genotyping for cervical cancer screening of women aged 25 years and older: a subanalysis of the ATHENA study [J].
Castle, Philip E. ;
Stoler, Mark H. ;
Wright, Thomas C., Jr. ;
Sharma, Abha ;
Wright, Teresa L. ;
Behrens, Catherine M. .
LANCET ONCOLOGY, 2011, 12 (09) :880-890
[5]   Five-Year Experience of Human Papillomavirus DNA and Papanicolaou Test Cotesting [J].
Castle, Philip E. ;
Fetterman, Barbara ;
Poitras, Nanty ;
Lorey, Yhomas ;
Shaber, Ruth ;
Kinney, Walter .
OBSTETRICS AND GYNECOLOGY, 2009, 113 (03) :595-600
[6]   Human Papillomavirus Infection with Multiple Types: Pattern of Coinfection and Risk of Cervical Disease [J].
Chaturvedi, Anil K. ;
Katki, Hormuzd A. ;
Hildesheim, Allan ;
Cecilia Rodriguez, Ana ;
Quint, Wim ;
Schiffman, Mark ;
Van Doorn, Leen-Jan ;
Porras, Carolina ;
Wacholder, Sholom ;
Gonzalez, Paula ;
Sherman, Mark E. ;
Herrero, Rolando .
JOURNAL OF INFECTIOUS DISEASES, 2011, 203 (07) :910-920
[7]  
COX DR, 1972, J R STAT SOC B, V34, P187
[8]   MULTIPLE IMPUTATION FOR THRESHOLD-CROSSING DATA WITH INTERVAL CENSORING [J].
DOREY, FJ ;
LITTLE, RJA ;
SCHENKER, N .
STATISTICS IN MEDICINE, 1993, 12 (17) :1589-1603
[9]  
Graubard BI, 1996, AM J EPIDEMIOL, V144, P102
[10]  
Groeneboom P., 1992, INFORM BOUNDS NONPAR, DOI 10.1007/978-3-0348-8621-5