On Data-Enriched Logistic Regression

被引:0
|
作者
Zheng, Cheng [1 ]
Dasgupta, Sayan [2 ]
Xie, Yuxiang [3 ]
Haris, Asad [3 ]
Chen, Ying-Qing [4 ]
机构
[1] Univ Nebraska Med Ctr, Dept Biostat, Omaha, NE 68198 USA
[2] Fred Hutchinson Canc Ctr, Vaccine & Infect Dis Div, Seattle, WA 98109 USA
[3] Univ Washington, Dept Biostat, Seattle, WA 98195 USA
[4] Stanford Univ, Dept Med, Palo Alto, CA 94305 USA
关键词
risk prediction; logistic regression; shrinkage estimator; big data; VARIABLE SELECTION; REGULARIZATION; SHRINKAGE; MODEL; RISK;
D O I
10.3390/math13030441
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Biomedical researchers typically investigate the effects of specific exposures on disease risks within a well-defined population. The gold standard for such studies is to design a trial with an appropriately sampled cohort. However, due to the high cost of such trials, the collected sample sizes are often limited, making it difficult to accurately estimate the effects of certain exposures. In this paper, we discuss how to leverage the information from external "big data" (datasets with significantly larger sample sizes) to improve the estimation accuracy at the risk of introducing a small amount of bias. We propose a family of weighted estimators to balance bias increase and variance reduction when incorporating the big data. We establish a connection between our proposed estimator and the well-known penalized regression estimators. We derive optimal weights using both second-order and higher-order asymptotic expansions. Through extensive simulation studies, we demonstrate that the improvement in mean square error (MSE) for the regression coefficient can be substantial even with finite sample sizes, and our weighted method outperformed existing approaches such as penalized regression and James-Stein estimator. Additionally, we provide a theoretical guarantee that the proposed estimators will never yield an asymptotic MSE larger than the maximum likelihood estimator using small data only in general. Finally, we apply our proposed methods to the Asia Cohort Consortium China cohort data to estimate the relationships between age, BMI, smoking, alcohol use, and mortality.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Data-enriched interpolation for temporally consistent population compositions
    Zoraghein, Hamidreza
    Leyk, Stefan
    GISCIENCE & REMOTE SENSING, 2019, 56 (03) : 430 - 461
  • [2] Structural transformations for data-enriched real-time systems
    Olderog, Ernst-Ruediger
    Swaminathan, Mani
    FORMAL ASPECTS OF COMPUTING, 2015, 27 (04) : 727 - 750
  • [3] Data-enriched edible pharmaceuticals (DEEP) of medical cannabis by inkjet printing
    Oblom, Heidi
    Cornett, Claus
    Botker, Johan
    Frokjaer, Sven
    Hansen, Harald
    Rades, Thomas
    Rantanen, Jukka
    Genina, Natalja
    INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2020, 589
  • [4] Efficient representation of turbulent flows using data-enriched finite elements
    Deshmukh, Rohit
    Shilt, Troy
    McNamara, Jack J.
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2020, 121 (15) : 3397 - 3416
  • [5] edCrumble, a Data-Enriched Visual Authoring Design Tool for Blended Learning
    Albo, Laia
    Hernandez-Leo, Davinia
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2021, 14 (01): : 55 - 68
  • [6] LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data
    Sartor, Maureen A.
    Leikauf, George D.
    Medvedovic, Mario
    BIOINFORMATICS, 2009, 25 (02) : 211 - 217
  • [7] Data-Enriched Edible Pharmaceuticals (DEEP) with Bespoke Design, Dose and Drug Release
    Chao, Meie
    oeblom, Heidi
    Cornett, Claus
    Botker, Johan
    Rantanen, Jukka
    Sporrong, Sofia Kaelvemark
    Genina, Natalja
    PHARMACEUTICS, 2021, 13 (11)
  • [8] Harnessing personalized tailored medicines to digital-based data-enriched edible pharmaceuticals
    Handa, Mayank
    Afzal, Obaid
    Beg, Sarwar
    Sanap, Sachin Nashik
    Kaundal, Ravinder K.
    Verma, Rahul K.
    Mishra, Awanish
    Shukla, Rahul
    DRUG DISCOVERY TODAY, 2023, 28 (05)
  • [9] Logistic Regression for Circular Data
    Al-Daffaie, Kadhem
    Khan, Shahjahan
    3RD ISM INTERNATIONAL STATISTICAL CONFERENCE 2016 (ISM III): BRINGING PROFESSIONALISM AND PRESTIGE IN STATISTICS, 2017, 1842
  • [10] Data enriched linear regression
    Chen, Aiyou
    Owen, Art B.
    Shi, Minghui
    ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01): : 1078 - 1112