An empirical study of code reuse between GitHub and stack overflow during software development

被引:1
作者
Chen, Xiangping [1 ,2 ]
Xu, Furen [1 ,3 ]
Huang, Yuan [1 ,3 ]
Zhou, Xiaocong [4 ]
Zheng, Zibin [1 ,3 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Peoples R China
[2] Sch Journalism & Commun, Guangzhou, Peoples R China
[3] Sch Software Engn, Guangzhou, Peoples R China
[4] Sch Comp Sci & Engn, Guangzhou, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Stack overflow; Code reuse; Code clone; Semantic analysis; STACKOVERFLOW;
D O I
10.1016/j.jss.2024.111964
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
With the rise of programming Q&A websites (e.g., Stack Overflow) and the open-source movement, code reuse has become a common phenomenon. Our study aims to provide a comprehensive study of the code reuse behavior of programmers during software development, i.e., we mainly focus on the code reuse between the code snippets in the commits of open-source projects and the code snippets on Stack Overflow (SO). The open-source java project code dataset we construct contains 793 projects which include 342,148 modified code snippets, and the SO code dataset includes 1,355,617 posts. Then, we employ a code clone detection tool to identify the instances of code reuse between the modified code snippets of commits and the code snippets of the SO posts. We find that the average code reuse ratio of the projects is 6.32%, which will have a significant upward trend in the future. Additionally, we find that experienced developers seem to be more likely to reuse the code on SO, and prefer to reuse posts with more favorites and higher scores. We combine deep learning and topic analysis algorithms to fully exploit the semantic information of SO posts. The result shows a certain difference in the distribution of post types reused by bug-related commits and non-bug-related commits. We also discover that the code reuse ratio (14.44%) in java class files that have undergone multiple modifications is more than double the overall code reuse ratio (6.32%). Finally, we discuss the reuse habits of programmers and find that they may refer to multiple posts in one reuse, and these posts are related to a certain extent. From these results, our study provides multiple practical insights for different stakeholders: researchers, developers, and the SO platform.
引用
收藏
页数:16
相关论文
共 55 条
  • [1] On code reuse from StackOverflow: An exploratory study on Android apps
    Abdalkareem, Rabe
    Shihab, Emad
    Rilling, Juergen
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2017, 88 : 148 - 158
  • [2] Toward Empirically Investigating Non-Functional Requirements of iOS Developers on Stack Overflow
    Ahmad, Arshad
    Feng, Chong
    Li, Kan
    Asim, Syed Mohammad
    Sun, Tingting
    [J]. IEEE ACCESS, 2019, 7 : 61145 - 61169
  • [3] An L, 2017, 2017 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), P283, DOI 10.1109/SANER.2017.7884629
  • [4] Code Duplication on Stack Overflow
    Baltes, Sebastian
    Treude, Christoph
    [J]. 2020 IEEE/ACM 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS (ICSE-NIER 2020), 2020, : 13 - 16
  • [5] Attribution Required: Stack Overflow Code Snippets in GitHub Projects
    Baltes, Sebastian
    Kiefer, Richard
    Diehl, Stephan
    [J]. PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 161 - 163
  • [6] Clone detection using abstract syntax trees
    Baxter, ID
    Yahin, A
    Moura, L
    Sant'Anna, M
    Bier, L
    [J]. INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, : 368 - 377
  • [7] Brandt J, 2009, CHI2009: PROCEEDINGS OF THE 27TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1589
  • [8] How Reliable is the Crowdsourced Knowledge of Security Implementation?
    Chen, Mengsu
    Fischer, Felix
    Meng, Na
    Wang, Xiaoyin
    Grossklags, Jens
    [J]. 2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 536 - 547
  • [9] The NiCad Clone Detector
    Cordy, James R.
    Roy, Chanchal K.
    [J]. 2011 IEEE 19TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2011, : 219 - +
  • [10] SOME QUICK SIGN TESTS FOR TREND IN LOCATION AND DISPERSION
    COX, DR
    STUART, A
    [J]. BIOMETRIKA, 1955, 42 (1-2) : 80 - 95