Mining the Network of the Programmers: A Data-Driven Analysis of GitHub

被引:3
作者
Ma, Yezhou [1 ]
Li, Huiying [1 ]
Hu, Jiyao [1 ]
Xie, Rong [1 ]
Chen, Yang [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China
来源
12TH CHINESE CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CHINESECSCW 2017) | 2017年
基金
中国国家自然科学基金; 上海市自然科学基金;
关键词
GitHub; professional social networks; PageRank; machine learning; spatial-temporal analysis;
D O I
10.1145/3127404.3127431
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
GitHub is a worldwide popular website for version control and source code management. In addition, since its users can follow each other, it also forms a professional social network of millions of users. In this work, we perform a data-driven study for analyzing the GitHub network. By introducing a distributed crawling framework, we first collect profiles and behavioral data of more than 2 million GitHub users. To the best of our knowledge, this is the largest and latest public dataset of GitHub. Then, we build the social graph of these users and conduct a thorough analysis of the network structure. Moreover, we investigate the user behavior patterns, particularly the patterns of the "commit" activities. Finally, we utilize machine learning methods to discover important users in the network with a high accuracy and a low overhead. Our inspiring findings are helpful for GitHub to provide better services for its users.
引用
收藏
页码:165 / 168
页数:4
相关论文
共 14 条
[1]  
[Anonymous], 1999, WWW 1999
[2]  
[Anonymous], 2010, P WWW
[3]  
Brandes U., 2009, P WWW
[4]  
Chen Tianqi, 2016, P ACM KDD
[5]  
Chen Y., 2016, P IEEE PERCOM WORKSH
[6]  
Dabbish L., 2012, P ACM CSCW
[7]  
Ding C., 2013, P ACM COSN
[8]   Understanding User Behavior in Online Social Networks: A Survey [J].
Jin, Long ;
Chen, Yang ;
Wang, Tianyi ;
Hui, Pan ;
Vasilakos, Athanasios V. .
IEEE COMMUNICATIONS MAGAZINE, 2013, 51 (09) :144-150
[9]  
Lima A., 2014, P AAAI ICWSM
[10]  
Majumder A., 2012, P ACM KDD