Predicting perceived ethnicity with data on personal names in Russia

被引:0
作者
Alexey Bessudnov
Denis Tarasov
Viacheslav Panasovets
Veronica Kostenko
Ivan Smirnov
Vladimir Uspenskiy
机构
[1] University of Exeter,Social and Political Sciences, Philosophy and Anthropology
[2] Constructor University Bremen,Computer Science
[3] St Petersburg State University,Applied Mathematics and Control Processes
[4] Sociology,Computational Social Sciences and Humanities
[5] European University at St Petersburg,Digital Transformation
[6] RWTH Aachen University,undefined
[7] ITMO University,undefined
来源
Journal of Computational Social Science | 2023年 / 6卷
关键词
Ethnicity; Russia; Machine learning; Prediction; Personal names;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we develop a machine learning classifier that predicts perceived ethnicity from data on personal names for major ethnic groups populating Russia. We collect data from VK, the largest Russian social media website. Ethnicity was coded from languages spoken by users and their geographical location, with the data manually cleaned by crowd workers. The classifier shows the accuracy of 0.82 for a scheme with 24 ethnic groups and 0.92 for 15 aggregated ethnic groups. It can be used for research on ethnicity and ethnic relations in Russia, with the data sets that have personal names but not ethnicity.
引用
收藏
页码:589 / 608
页数:19
相关论文
共 65 条
  • [1] Lazer D(2017)Data ex machina: introduction to big data Annual Review of Sociology 43 19-39
  • [2] Radford J(2021)The Golden Age of Social Science Proceedings of the National Academy of Sciences. 118 1073-6
  • [3] Buyalskaya A(2015)Predicting poverty and wealth from mobile phone metadata Science. 350 5802-5
  • [4] Gallo M(2013)Private traits and attributes are predictable from digital records of human behavior Proceedings of the National Academy of Sciences. 110 13108-13
  • [5] Camerer CF(2017)Using deep learning and google street view to estimate the demographic makeup of neighborhoods across the United States Proceedings of the National Academy of Sciences 114 e270-2
  • [6] Blumenstock J(2021)The need for improved collection and coding of ethnicity in health research Journal of Public Health 43 263-72
  • [7] Cadamuro G(2020)Ethnicity, inequality, and perceived electoral fairness Social Science Research 85 390-5
  • [8] On R(2016)Improving ecological inference by predicting individual ethnicity from voter registration records Political Analysis 24 243-63
  • [9] Kosinski M(1988)The classification of ethnic status using name information Journal of Epidemiology & Community Health. 42 625-56
  • [10] Stillwell D(2007)A review of name-based ethnicity classification methods and their potential in population studies Population, Space and Place. 13 1-14