Learning to detect community smells in open source software projects

被引:27
作者
Almarimi, Nuri [1 ]
Ouni, Ali [1 ]
Mkaouer, Mohamed Wiem [2 ]
机构
[1] Univ Quebec, ETS Montreal, Montreal, PQ, Canada
[2] Rochester Inst Technol, Rochester, NY 14623 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Community smells detection; Social debt; Socio-technical metrics; Machine learning; PREDICTION; QUALITY;
D O I
10.1016/j.knosys.2020.106201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Community smells are symptoms of organizational and social issues within the software development community that often lead to additional project costs. Recent studies identified a variety of community smells and defined them as sub-optimal patterns connected to organizational-social structures in the software development community. To early detect and discover existence of potential community smells in a software project, we introduce, in this paper, a novel machine learning-based detection approach, named CSDETECTOR, that learns from various existing bad community development practices to provide automated support in detecting such community smells. In particular, our approach learns from a set of organizational-social symptoms that characterize the existence of potential instances of community smells in a software project. We built a detection model using Decision Tree by adopting the C4.5 classifier to identify eight commonly occurring community smells in software projects. To evaluate the performance of our approach, we conduct an empirical study on a benchmark of 74 open source projects from Github. Our statistical results show a high performance of CSDETECTOR, achieving an average accuracy of 96% and AUC of 0.94. Moreover, our results indicate that the CSDETECTOR outperforms two recent state-of-the-art techniques in terms of detection accuracy. Finally, we investigate the most influential community-related metrics to identify each community smell type. We found that the number of commits and developers per time zone, the number of developers per community, and the social network betweenness and closeness centrality are the most influential community characteristics. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
[1]  
Almarimi N., 2020, 15 IEEE ACM INT C GL, P1
[2]  
[Anonymous], ICS P 30 INT C
[3]  
[Anonymous], 2019, REPLICATION PACKAGE
[4]  
[Anonymous], 1999, MODERN INFORM RETRIE
[5]  
[Anonymous], 2008, SIGSOFT
[6]  
Avelino G., 2016, Proceedings of IEEE 24th International Conference on Program Comprehension (ICPC), P1
[7]  
Bindrees Mohammed A., 2014, Journal of Computer Science, V10, P2593, DOI 10.3844/jcssp.2014.2593.2607
[8]  
Bird Christian, 2009, 2009 20th International Symposium on Software Reliability Engineering (ISSRE 2009), P109, DOI 10.1109/ISSRE.2009.17
[9]   Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista [J].
Bird, Christian ;
Nagappan, Nachiappan ;
Devanbu, Premkumar ;
Gall, Harald ;
Murphy, Brendan .
2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :518-+
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32