A systematic process for Mining Software Repositories: Results from a systematic literature review

被引:17
作者
Vidoni, M. [1 ]
机构
[1] Australian Natl Univ, CECS Sch Comp, Canberra, ACT, Australia
关键词
Mining Software Repositories; Systematic literature review; Evidence-based software engineering; Guidelines; GITHUB; CLASSIFICATION; DEVELOPERS; PROJECTS; DATASET; FLOW;
D O I
10.1016/j.infsof.2021.106791
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Mining Software Repositories (MSR) is a growing area of Software Engineering (SE) research. Since their emergence in 2004, many investigations have analysed different aspects of these studies. However, there are no guidelines on how to conduct systematic MSR studies. There is a need to evaluate how MSR research is approached to provide a framework to do so systematically. Objective: To identify how MSR studies are conducted in terms of repository selection and data extraction. To uncover potential for improvement in directing systematic research and providing guidelines to do so. Method: A systematic literature review of MSR studies was conducted following the guidelines and template proposed by Mian et al. (which refines those provided by Kitchenham and Charters). These guidelines were extended and revised to provide a framework for systematic MSR studies. Results: MSR studies typically do not follow a systematic approach for repository selection, and many do not report selection or data extraction protocols. Furthermore, few manuscripts discuss threats to the study's validity due to the selection or data extraction steps followed. Conclusions: Although MSR studies are evidence-based research, they seldom follow a systematic process. Hence, there is a need for guidelines on how to conduct systematic MSR studies. New guidelines and a template have been proposed, consolidating related studies in the MSR field and strategies for systematic literature reviews.
引用
收藏
页数:17
相关论文
共 177 条
[51]   The effects of game-based learning in the acquisition of "soft skills" on undergraduate software engineering courses: A systematic literature review [J].
Garcia, Ivan ;
Pacheco, Carla ;
Mendez, Francisco ;
Calvo-Manzano, Jose A. .
COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2020, 28 (05) :1327-1354
[52]   Source code analysis dataset [J].
Gelman, Ben ;
Obayomi, Banjo ;
Moore, Jessica ;
Slater, David .
DATA IN BRIEF, 2019, 27
[53]  
German D. M., 2007, MIN SOFTW REP 2007 I, P24, DOI DOI 10.1109/MSR.2007.32
[54]  
Goeminne M, 2013, IEEE WORK CONF MIN S, P225, DOI 10.1109/MSR.2013.6624032
[55]   A comparison of identity merge algorithms for software repositories [J].
Goeminne, Mathieu ;
Mens, Tom .
SCIENCE OF COMPUTER PROGRAMMING, 2013, 78 (08) :971-986
[56]   Mining Software Engineering Data from GitHub [J].
Gousios, Georgios ;
Spinellis, Diomidis .
PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, :501-502
[57]   Performance Assessment of Bug Fixing Process in Open Source Repositories [J].
Goyal, Anjali ;
Sardana, Neetu .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 :2070-2079
[58]   Emerging topics in mining software repositories: Machine learning in software repositories and datasets [J].
Güemes-Peña D. ;
López-Nozal C. ;
Marticorena-Sánchez R. ;
Maudes-Raedo J. .
Progress in Artificial Intelligence, 2018, 7 (03) :237-247
[59]   Explaining Successful Docker Images Using Pattern Mining Analysis [J].
Guidotti, Riccardo ;
Soldani, Jacopo ;
Neri, Davide ;
Brogi, Antonio .
SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS, 2018, 11176 :98-113
[60]   Nirikshan: Process Mining Software Repositories to Identify Inefficiencies, Imperfections, and Enhance Existing Process Capabilities [J].
Gupta, Monika .
36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE COMPANION 2014), 2014, :658-661