Classification of Poverty Condition Using Natural Language Processing

被引：0

作者：

Guberney Muñetón-Santa

Daniel Escobar-Grisales

Felipe Orlando López-Pabón

Paula Andrea Pérez-Toro

Juan Rafael Orozco-Arroyave

机构：

[1] Universidad de Antioquia,GITA Lab. Faculty of Engineering

[2] Universidad de Antioquia,Instituto de Estudios Regionales

[3] Friedrich Alexander-Universität,Pattern Recognition Lab.

来源：

Social Indicators Research | 2022年 / 162卷

关键词：

Poverty; Natural language processing; Text classification; Word embedding; Document-level embedding; Machine learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.

引用

页码：1413 / 1435

页数：22

共 92 条

[1] Abdillah J(2020)Emotion classification of song lyrics using bidirectional lstm method with glove word representation weighting Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 4 723-729
[2] Asror I(2007)The missing dimensions of poverty data: Introduction to the special issue Oxford development studies 35 347-359
[3] Wibowo YFA(2012)The missing dimensions of children’s well-being and well-becoming in education systems: Capabilities and philosophy for children Journal of Human Development and Capabilities 13 373-395
[4] Alkire S(2015)Predicting poverty and wealth from mobile phone metadata Science 350 1073-1076
[5] Biggeri M(2001)Random forests Machine learning 45 5-32
[6] Santi M(2017)Personal accounts of poverty: A thematic analysis of social media Journal of Evidence-Informed Social Work 14 433-456
[7] Blumenstock J(2004)Latent semantic analysis Annual review of information science and technology 38 188-230
[8] Cadamuro G(2020)Identity verification in virtual education using biometric analysis based on keystroke dynamics TecnoLógicas 23 193-207
[9] On R(2016)Machine translation: Mining text for social theory Annual Review of Sociology 42 21-50
[10] Breiman L(2017)Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states Proceedings of the National Academy of Sciences 114 13108-13113

← 1 2 3 4 5 6 7 8 9 10 →