Classification of Poverty Condition Using Natural Language Processing

被引:0
作者
Guberney Muñetón-Santa
Daniel Escobar-Grisales
Felipe Orlando López-Pabón
Paula Andrea Pérez-Toro
Juan Rafael Orozco-Arroyave
机构
[1] Universidad de Antioquia,GITA Lab. Faculty of Engineering
[2] Universidad de Antioquia,Instituto de Estudios Regionales
[3] Friedrich Alexander-Universität,Pattern Recognition Lab.
来源
Social Indicators Research | 2022年 / 162卷
关键词
Poverty; Natural language processing; Text classification; Word embedding; Document-level embedding; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.
引用
收藏
页码:1413 / 1435
页数:22
相关论文
共 92 条
  • [1] Abdillah J(2020)Emotion classification of song lyrics using bidirectional lstm method with glove word representation weighting Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi) 4 723-729
  • [2] Asror I(2007)The missing dimensions of poverty data: Introduction to the special issue Oxford development studies 35 347-359
  • [3] Wibowo YFA(2012)The missing dimensions of children’s well-being and well-becoming in education systems: Capabilities and philosophy for children Journal of Human Development and Capabilities 13 373-395
  • [4] Alkire S(2015)Predicting poverty and wealth from mobile phone metadata Science 350 1073-1076
  • [5] Biggeri M(2001)Random forests Machine learning 45 5-32
  • [6] Santi M(2017)Personal accounts of poverty: A thematic analysis of social media Journal of Evidence-Informed Social Work 14 433-456
  • [7] Blumenstock J(2004)Latent semantic analysis Annual review of information science and technology 38 188-230
  • [8] Cadamuro G(2020)Identity verification in virtual education using biometric analysis based on keystroke dynamics TecnoLógicas 23 193-207
  • [9] On R(2016)Machine translation: Mining text for social theory Annual Review of Sociology 42 21-50
  • [10] Breiman L(2017)Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the united states Proceedings of the National Academy of Sciences 114 13108-13113