Structural effects of network sampling coverage I: Nodes missing at random

被引:138
作者
Smith, Jeffrey A. [1 ]
Moody, James [2 ]
机构
[1] Univ Nebraska Lincoln, Lincoln, NE 68588 USA
[2] Duke Univ, Durham, NC 27706 USA
关键词
Missing data; Network sampling; Network bias; SOCIAL NETWORKS; CENTRALITY MEASURES; MODELS; STABILITY; ERROR;
D O I
10.1016/j.socnet.2013.09.003
中图分类号
Q98 [人类学];
学科分类号
030303 ;
摘要
Network measures assume a census of a well-bounded population. This level of coverage is rarely achieved in practice, however, and we have only limited information on the robustness of network measures to incomplete coverage. This paper examines the effect of node-level missingness on 4 classes of network measures: centrality, centralization, topology and homophily across a diverse sample of 12 empirical networks. We use a Monte Carlo simulation process to generate data with known levels of missingness and compare the resulting network scores to their known starting values. As with past studies (Borgatti et al., 2006; Kossinets, 2006), we find that measurement bias generally increases with more missing data. The exact rate and nature of this increase, however, varies systematically across network measures. For example, betweenness and Bonacich centralization are quite sensitive to missing data while closeness and in-degree are robust. Similarly, while the tau statistic and distance are difficult to capture with missing data, transitivity shows little bias even with very high levels of missingness. The results are also clearly dependent on the features of the network. Larger, more centralized networks are generally more robust to missing data, but this is especially true for centrality and centralization measures. More cohesive networks are robust to missing data when measuring topological features but not when measuring centralization. Overall, the results suggest that missing data may have quite large or quite small effects on network measurement, depending on the type of network and the question being posed. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:652 / 668
页数:17
相关论文
共 50 条
[1]   A nonparametric approach to matched pairs with missing data [J].
Akritas, MG ;
Kuha, J ;
Osgood, DW .
SOCIOLOGICAL METHODS & RESEARCH, 2002, 30 (03) :425-454
[2]   Statistical mechanics of complex networks [J].
Albert, R ;
Barabási, AL .
REVIEWS OF MODERN PHYSICS, 2002, 74 (01) :47-97
[3]   Error and attack tolerance of complex networks [J].
Albert, R ;
Jeong, H ;
Barabási, AL .
NATURE, 2000, 406 (6794) :378-382
[4]   BUILDING STOCHASTIC BLOCKMODELS [J].
ANDERSON, CJ ;
WASSERMAN, S ;
FAUST, K .
SOCIAL NETWORKS, 1992, 14 (1-2) :137-161
[5]  
[Anonymous], 2004, WOMEN HEALTH
[6]  
[Anonymous], 1983, Applied Network Analysis
[7]   BRINGING SOCIETY BACK IN SURVEY RESEARCH AND MACRO-METHODOLOGY [J].
BARTON, AH .
AMERICAN BEHAVIORAL SCIENTIST, 1968, 12 (02) :1-&
[8]   DIRECT AND INDIRECT METHODS FOR STRUCTURAL EQUIVALENCE [J].
BATAGELJ, V ;
FERLIGOJ, A ;
DOREIAN, P .
SOCIAL NETWORKS, 1992, 14 (1-2) :63-90
[9]   On the robustness of centrality measures under conditions of imperfect data [J].
Borgatti, SP ;
Carley, KM ;
Krackhardt, D .
SOCIAL NETWORKS, 2006, 28 (02) :124-136
[10]   Identification of peer effects through social networks [J].
Bramoulle, Yann ;
Djebbari, Habiba ;
Fortin, Bernard .
JOURNAL OF ECONOMETRICS, 2009, 150 (01) :41-55