Multi-source statistics on employment status in Italy, a machine learning approach

被引:1
作者
Varriale, Roberta [1 ,2 ]
Alfo', Marco [2 ]
机构
[1] Istat Italian Natl Inst Stat, Directorate Methodol & Stat Proc Design, Via Balbo 16, I-00184 Rome, Italy
[2] Sapienza Univ Rome, Dept Stat Sci, Piazzale Aldo Moro 5, I-00185 Rome, Italy
来源
METRON-INTERNATIONAL JOURNAL OF STATISTICS | 2023年 / 81卷 / 01期
关键词
Multi-source statistics; Employment status; Machine learning; Classification error;
D O I
10.1007/s40300-023-00242-7
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In recent decades, National Statistical Institutes have started to produce official statistics by exploiting multiple sources of information (multi-source statistics) rather than a single source, usually a statistical survey. In this context, one of the research projects addressed by the Italian National Statistical Institute (Istat) concerned methods for producing estimates on employment in Italy using survey data and administrative sources. The former are drawn from the Labour Force survey conducted by Istat, the latter from several administrative sources that Istat regularly acquires from external bodies. We use machine learning methods to predict the individual employment status. This approach is based on the application of decision tree and random forest techniques, that are frequently used to classify large amounts of data. We show how to construct a "new" response variable denoting agreement of the data sources: this approach is shown to maximise the information we may derive by machine learning approach in some problematic cases. The methods have been applied using the R software.
引用
收藏
页码:37 / 63
页数:27
相关论文
共 18 条
[1]  
Biemer P., 2004, Survey Methodology, V30, P127
[2]  
Biemer P.P., 2011, Latent Class Analysis of Survey Error
[3]  
Breiman L., 2022, PACKAGE RPART
[4]   Multi-source Statistics: Basic Situations and Methods [J].
de Waal, Ton ;
van Delden, Arnout ;
Scholtus, Sander .
INTERNATIONAL STATISTICAL REVIEW, 2020, 88 (01) :203-228
[5]  
Di Zio M., 2017, The Survey Statistician, V76, P17
[6]  
ESSnetAdmin Data, 2013, ADM DAT GLOSS DEF AD
[7]  
Filipponi D., 2021, Book of Short Papers Sis 2021, P112
[8]  
Istat, 2006, Istat, Metodi e Norme, V32, P173
[9]  
Lunardon N., 2021, PACKAGE ROSE
[10]   MEMORY BIAS IN RETROSPECTIVELY COLLECTED EMPLOYMENT CAREERS: A MODEL-BASED APPROACH TO CORRECT FOR MEASUREMENT ERROR [J].
Manzoni, Anna ;
Vermunt, Jeroen K. ;
Luijkx, Ruud ;
Muffels, Ruud .
SOCIOLOGICAL METHODOLOGY, VOL 40, 2010, 40 :39-73