Estimating Homophily in Social Networks Using Dyadic Predictions

被引:1
作者
Berry, George [1 ]
Sirianni, Antonio [2 ]
Weber, Ingmar [3 ]
An, Jisun [4 ]
Macy, Michael [1 ]
机构
[1] Cornell Univ, Dept Sociol, Ithaca, NY 14853 USA
[2] Dartmouth Coll, Dept Sociol, Hanover, NH 03755 USA
[3] Qatar Comp Res Inst, Doha, Qatar
[4] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
关键词
homophily; networks; machine learning; quantitative methodology; SEGREGATION; CORE;
D O I
10.15195/v8.a14
中图分类号
C91 [社会学];
学科分类号
030301 ; 1204 ;
摘要
Predictions of node categories are commonly used to estimate homophily and other relational properties in networks. However, little is known about the validity of using predictions for this task. We show that estimating homophily in a network is a problem of predicting categories of dyads (edges) in the graph. Homophily estimates are unbiased when predictions of dyad categories are unbiased. Node-level prediction models, such as the use of names to classify ethnicity or gender, do not generally produce unbiased predictions of dyad categories and therefore produce biased homophily estimates. Bias comes from three sources: sampling bias, correlation between model errors and node degree, and correlation between node-level model errors along dyads. We examine three methods for estimating homophily: predicting node categories, predicting dyad categories, and a hybrid "ego-alter" approach. This analysis indicates that only the dyadic prediction approach is unbiased, whereas the node-level approach produces both high bias and high overall error. We find that node-level classification performance is not a reliable indicator of accuracy for homophily. Although this article focuses on a particular version of homophily, results generalize to heterophilous cases and other dyadic measures. We conclude with suggestions for research design. Code for this article is available at https://github.com/georgeberry/autocorr.
引用
收藏
页码:285 / 307
页数:23
相关论文
共 51 条
[1]  
Aggarwal C, 2016, PROC INT CONF DATA, P1038, DOI 10.1109/ICDE.2016.7498311
[2]  
AlZamal F, 2012, P 6 INT AAAI C WEBL
[3]  
Angrist JD, 2009, MOSTLY HARMLESS ECONOMETRICS: AN EMPIRICISTS COMPANION, P1
[4]   Exposure to ideologically diverse news and opinion on Facebook [J].
Bakshy, Eytan ;
Messing, Solomon ;
Adamic, Lada A. .
SCIENCE, 2015, 348 (6239) :1130-1132
[5]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[6]   Tweeting From Left to Right: Is Online Political Communication More Than an Echo Chamber? [J].
Barbera, Pablo ;
Jost, John T. ;
Nagler, Jonathan ;
Tucker, Joshua A. ;
Bonneau, Richard .
PSYCHOLOGICAL SCIENCE, 2015, 26 (10) :1531-1542
[7]  
Barbera Pablo., 2016, LESS IS MORE DEMOGRA
[8]   The opacity problem in social contagion [J].
Berry, George ;
Cameron, Christopher J. ;
Park, Patrick ;
Macy, Michael .
SOCIAL NETWORKS, 2019, 56 :93-101
[9]   Estimating Group Properties in Online Social Networks with a Classifier [J].
Berry, George ;
Sirianni, Antonio ;
High, Nathan ;
Kellum, Agrippa ;
Weber, Ingmar ;
Macy, Michael .
SOCIAL INFORMATICS, SOCINFO 2018, PT I, 2018, 11185 :67-85
[10]   MACROSOCIOLOGICAL THEORY OF SOCIAL-STRUCTURE [J].
BLAU, PM .
AMERICAN JOURNAL OF SOCIOLOGY, 1977, 83 (01) :26-54