The effect of sample size and missingness on inference with missing data

被引:0
作者
Morimoto, Julian [1 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
关键词
Incomplete data; sample size and missing data mechanism; partial likelihood; asymptotic inference with missing data; MULTIPLE IMPUTATION; LIKELIHOOD;
D O I
10.1080/03610926.2022.2152287
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
When are inferences (whether Direct-Likelihood, Bayesian, or Frequentist) obtained from partial data valid? This article answers this question by offering a new asymptotic theory about inference with missing data that is more general than existing theories. It proves that as the sample size increases and the extent of missingness decreases, the average-loglikelihood function generated by partial data and that ignores the missingness mechanism will converge in probability to that which would have been generated by complete data; and if the data are Missing at Random, this convergence depends only on sample size. Thus, inferences from partial data, such as posterior modes, confidence intervals, likelihood ratios, test statistics, and indeed, all quantities or features derived from the partial-data loglikelihood function, will be consistently estimated. Additionally, the missing data mechanism has asymptotically no effect on parameter estimation and hypothesis testing if the data are Missing at Random. This adds to previous research which has only proved the consistency and asymptotic normality of the posterior mode. Practical implications are discussed, and the theory is illustrated through simulation using a previous study of International Human Rights Law.
引用
收藏
页码:3292 / 3311
页数:20
相关论文
共 28 条
  • [1] Missing data techniques for multilevel data: implications of model misspecification
    Black, Anne C.
    Harel, Ofer
    McCoach, D. Betsy
    [J]. JOURNAL OF APPLIED STATISTICS, 2011, 38 (09) : 1845 - 1865
  • [2] Multiple Imputation of Missing Phenotype Data for QTL Mapping
    Bobb, Jennifer F.
    Scharfstein, Daniel O.
    Daniels, Michael J.
    Collins, Francis S.
    Kelada, Samir
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01):
  • [3] Folland G.B., 2007, REAL ANAL MODERN TEC
  • [4] Effect of data gaps on correlation dimension computed from light curves of variable stars
    George, Sandip V.
    Ambika, G.
    Misra, R.
    [J]. ASTROPHYSICS AND SPACE SCIENCE, 2015, 360 (01)
  • [5] Honaker J, 2011, J STAT SOFTW, V45, P1
  • [6] When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts
    Jakobsen, Janus Christian
    Gluud, Christian
    Wetterslev, Jorn
    Winkel, Per
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [7] Jochen H, 2013, Open Journal of Statistics, V2013
  • [8] Analyzing incomplete political science data: An alternative algorithm for multiple imputation
    King, G
    Honaker, J
    Joseph, A
    Scheve, K
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 2001, 95 (01) : 49 - 69
  • [9] Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example
    Knol, Mirjam J.
    Janssen, Kristel J. M.
    Donders, A. Rogier T.
    Egberts, Antoine C. G.
    Heerdink, E. Rob
    Grobbee, Diederick E.
    Moons, Karel G. M.
    Geerlings, Mirjam I.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2010, 63 (07) : 728 - 736
  • [10] The Performance of Multiple Imputation for Likert-type Items with Missing Data
    Leite, Walter
    Beretvas, S. Natasha
    [J]. JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2010, 9 (01) : 64 - 74