Modelling Wealth from Call Detail Records and Survey Data with Machine Learning: Evidence from Papua New Guinea

被引：0

作者：

Khaefi, Muhammad Rizal ^{[1
]}

Hendrik

Burra, Dharani Dhar

Dianco, Rio Fandi

Alkarisya, Dikara Maitri Pradipta

Muztahid, Muhammad Rheza

Zahara, Annissa

Hodge, George

Idzalika, Rajius

机构：

[1] Pulse Lab Jakarta United Nations Global Pulse, Jakarta, Indonesia

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2019年

关键词：

Wealth prediction; mobile network data; survey; regression; machine learning; REGULARIZATION; SELECTION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Call detail records (CDRs) provide a significant opportunity to understand human development at a high spatiotemporal resolution, specifically in developing countries, which face financial, human, and capacity constraints. This study attempts to model and identify features derived from CDRs that can best predict relative wealth and poverty across Papua New Guinea (PNG), by combining it with tele-survey data. Our findings show promising results on the prediction of dichotomous variables consisting of self-reported household assets with an Area Under the Curve (AUC) score is equal to 0.88 or higher. Meanwhile, the prediction of the numerical wealth index, which was built using a dimensional reduction method did not provide satisfactory results. For the target variable Principle Component Analysis (PCA) derived numerical wealth index, the Root Mean Squared Error (RMSE) 0.69 is lower compared to its standard deviation 0.74. The numerical index was further classified into quintiles, and this was also used as a response variable separately and was subjected to a multi-class classification approach. The best F1 score for multiclass-classification of the quintiles derived from PCA was 0.7. Findings from the Multiple Correspondence Analysis (MCA) derived index and quintiles add the robustness to our study. The overall results suggest that in the case of PNG, CDRs are better suited for developing proxy indicators related to individual household assets and classified quintiles based wealth index, but not for a numerical relative wealth index. hi general, our results add more confidence in harnessing mobile network data to predict wealth and poverty.

引用

页码：2855 / 2864

页数：10

共 36 条

[1] Principal component analysis [J].

Abdi, Herve ;

Williams, Lynne J. .

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459

[2]

Adeg H., STATUS SPECTRUM MANA

[3]

[Anonymous], 2016, KDD16 P 22 ACM, DOI DOI 10.1145/2939672.2939785

[4] Predicting poverty and wealth from mobile phone metadata [J].

Blumenstock, Joshua ;

Cadamuro, Gabriel ;

On, Robert .

SCIENCE, 2015, 350 (6264) :1073-1076

[5] AN ANALYSIS OF TRANSFORMATIONS [J].

BOX, GEP ;

COX, DR .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1964, 26 (02) :211-252

[6] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[7] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[8] A tutorial on v-support vector machines [J].

Chen, PH ;

Lin, CJ ;

Schölkopf, B .

APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2005, 21 (02) :111-136

[9]

CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411

[10]

de Montjoye YA, 2016, J MACH LEARN RES, V17

← 1 2 3 4 →