A Review of Big Data and Machine Learning Operations in Official Statistics: MLOps and Feature Store Adoption

被引:0
作者
Ramos Nunes, Carlos Eduardo [1 ]
Ashofteh, Afshin [1 ]
机构
[1] Nova Univ Lisbon, NOVA Informat Management Sch NOVA IMS, Lisbon, Portugal
来源
2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024 | 2024年
关键词
Feature store; Official statistics; Machine learning operations; Data science; Big data; Data quality; QUALITY;
D O I
10.1109/COMPSAC61105.2024.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Integrating machine learning (ML) into the official statisticians' toolset is gaining popularity as National Statistical Offices (NSOs) strive to improve their methodologies. This trend poses new challenges and implications for incorporating innovative techniques that ensure the reliability of the official statistical production process. A comprehensive literature review was conducted using Scopus and Web of Science databases to explore the contemporary applications of data science in official statistics. A total of 178 research articles were identified, focusing on areas such as big data, machine learning, and data quality. While the literature review revealed extensive proposals on utilizing alternative data and applying machine learning techniques to support official statistics production, it also identified research gaps in the post-training steps of the machine learning process. Areas requiring further investigation include machine learning operations in a production environment, data quality assurance, and governance.
引用
收藏
页码:711 / 718
页数:8
相关论文
共 33 条
[1]   Big Data and Official Statistics† [J].
Abraham, Katharine G. .
REVIEW OF INCOME AND WEALTH, 2022, 68 (04) :835-861
[2]  
[Anonymous], Machine Learning for Official Statistics
[3]  
Ashofteh Afshin, 2021, Statistical Journal of the IAOS, V37, P771, DOI 10.3233/SJI-210841
[4]  
Ashofteh Afshin, 2020, Statistical Journal of the IAOS, V36, P291, DOI 10.3233/SJI-200674
[5]   Classification of Building Types in Germany: A Data-Driven Modeling Approach [J].
Bandam, Abhilash ;
Busari, Eedris ;
Syranidou, Chloi ;
Linssen, Jochen ;
Stolten, Detlef .
DATA, 2022, 7 (04)
[6]   Uncovering temporal changes in Europe's population density patterns using a data fusion approach [J].
Batista e Silva, Filipe ;
Freire, Sergio ;
Schiavina, Marcello ;
Rosina, Konstantin ;
Marin-Herrera, Mario Alberto ;
Ziemba, Lukasz ;
Craglia, Massimo ;
Koomen, Eric ;
Lavalle, Carlo .
NATURE COMMUNICATIONS, 2020, 11 (01)
[7]   Timely Estimates of the Monthly Mexican Economic Activity [J].
Corona, Francisco ;
Gonzalez-Farias, Graciela ;
Lopez-Perez, Jesus .
JOURNAL OF OFFICIAL STATISTICS, 2022, 38 (03) :733-765
[8]   Using Facebook ad data to track the global digital gender gap [J].
Fatehkia, Masoomali ;
Kashyap, Ridhi ;
Weber, Ingmar .
WORLD DEVELOPMENT, 2018, 107 :189-209
[9]   Simulation-based estimation of the early spread ofCOVID-19 in Iran: actual versus confirmed cases [J].
Ghaffarzadegan, Navid ;
Rahmandad, Hazhir .
SYSTEM DYNAMICS REVIEW, 2020, 36 (01) :101-129
[10]   Detecting country of residence from social media data: a comparison of methods [J].
Heikinheimo, V ;
Jarv, O. ;
Tenkanen, H. ;
Hiippala, T. ;
Toivonen, T. .
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2022, 36 (10) :1931-1952