Model-Assisted Estimation Through Random Forests in Finite Population Sampling

被引:19
作者
Dagdoug, Mehdi [1 ]
Goga, Camelia [1 ]
Haziza, David [2 ]
机构
[1] Univ Bourgogne Franche Comte, Lab Math Besancon, Besancon, France
[2] Univ Ottawa, Dept Math & Stat, 150 Louis Pasteur Private, Ottawa, ON K1N 6N5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Model-assisted approach; Model-calibration; Nonparametric regression; Random forest; Survey data; Variance estimation; ASYMPTOTIC CONFIDENCE BANDS; AUXILIARY INFORMATION; VARIANCE REDUCTION; SURVEY DESIGN; APPROXIMATION;
D O I
10.1080/01621459.2021.1987250
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In surveys, the interest lies in estimating finite population parameters such as population totals and means. In most surveys, some auxiliary information is available at the estimation stage. This information may be incorporated in the estimation procedures to increase their precision. In this article, we use random forests (RFs) to estimate the functional relationship between the survey variable and the auxiliary variables. In recent years, RFs have become attractive as National Statistical Offices have now access to a variety of data sources, potentially exhibiting a large number of observations on a large number of variables. We establish the theoretical properties of model-assisted procedures based on RFs and derive corresponding variance estimators. A model-calibration procedure for handling multiple survey variables is also discussed. The results of a simulation study suggest that the proposed point and estimation procedures perform well in terms of bias, efficiency and coverage of normal-based confidence intervals, in a wide variety of settings. Finally, we apply the proposed methods using data on radio audiences collected by Mediametrie, a French audience company. Supplementary materials for this article are available online.
引用
收藏
页码:1234 / 1251
页数:18
相关论文
共 50 条
  • [41] An effective and economic estimation of population mean in stratified random sampling using a linear cost function
    Zaagan, Abdullah A.
    Verma, Mukesh Kumar
    Mahnashi, Ali M.
    Yadav, Subhash Kumar
    Ahmadini, Abdullah Ali H.
    Meetei, Mutum Zico
    Varshney, Rahul
    HELIYON, 2024, 10 (10)
  • [42] Estimation of finite population mean in stratified sampling using scrambled responses in the presence of measurement errors
    Khalil, Sadia
    Gupta, Sat
    Hanif, Muhammad
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2019, 48 (06) : 1553 - 1561
  • [43] Estimation of total electricity consumption curves by sampling in a finite population when some trajectories are partially unobserved
    Cardot, Herve
    De Moliner, Anne
    Goga, Camelia
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2019, 47 (01): : 65 - 89
  • [44] Hyperspectral estimation model of plumbum concentration in soil of mining areas based on wavelet transform and random forests
    Lv, J.
    Li, X. M.
    Kang, J.
    LAND RECLAMATION IN ECOLOGICAL FRAGILE AREAS, 2017, : 223 - 226
  • [45] An Application of Geographical Random Forests for Population Estimation in Dakar, Senegal using Very-High-Resolution Satellite Imagery
    Georganos, Stefanos
    Grippa, Tais
    Gadiaga, Assane
    Vanhuysse, Sabine
    Kalogirou, Stamatis
    Lennert, Moritz
    Linard, Catherine
    2019 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2019,
  • [46] TWO EFFICIENT CLASS OF RATIO ESTIMATORS FOR POPULATION MEAN ESTIMATION USING AUXILIARY INFORMATION IN SIMPLE RANDOM SAMPLING
    Sharma, Richa
    Singh, Lakhan
    Yadav, Subhash Kumar
    Kumar, Surendra
    Kumar, Surendra
    Sangal, Prabhat Kumar
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2022, 18 : 1271 - 1276
  • [47] New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling
    Daraz, Umer
    Wu, Jinbiao
    Alomair, Mohammed Ahmed
    Aldoghan, Luai Abdulla
    HELIYON, 2024, 10 (13)
  • [48] CONDITIONAL BIAS ROBUST ESTIMATION OF THE TOTAL OF CURVE DATA BY SAMPLING IN A FINITE POPULATION: AN ILLUSTRATION ON ELECTRICITY LOAD CURVES
    Cardot, Herve
    De Moliner, Anne
    Goga, Camelia
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2020, 8 (03) : 453 - 482
  • [49] Estimation of a finite population distribution function based on a linear model with unknown heteroscedastic errors
    Lombardía, MJ
    González-Manteiga, W
    Prada-Sánchez, JM
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2005, 33 (02): : 181 - 200
  • [50] A comparison of design-based and model-based approaches for finite population spatial sampling and inference
    Dumelle, Michael
    Higham, Matt
    Ver Hoef, Jay M.
    Olsen, Anthony R.
    Madsen, Lisa
    METHODS IN ECOLOGY AND EVOLUTION, 2022, 13 (09): : 2018 - 2029