Boa: Ultra-Large-Scale Software Repository and Source-Code Mining

被引:58
|
作者
Dyer, Robert [1 ]
Hoan Anh Nguyen [2 ]
Rajan, Hridesh [2 ]
Nguyen, Tien N. [2 ]
机构
[1] Bowling Green State Univ, Bowling Green, OH 43403 USA
[2] Iowa State Univ, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Boa; mining software repositories; domain-specific language; scalable; ease of use; lower barrier to entry;
D O I
10.1145/2803171
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In today's software-centric world, ultra-large-scale software repositories, such as SourceForge, GitHub, and Google Code, are the new library of Alexandria. They contain an enormous corpus of software and related information. Scientists and engineers alike are interested in analyzing this wealth of information. However, systematic extraction and analysis of relevant data from these repositories for testing hypotheses is hard, and best left for mining software repository (MSR) experts! Specifically, mining source code yields significant insights into software development artifacts and processes. Unfortunately, mining source code at a large scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse grained, or sacrifice studying the history of the code. In this article we address mining source code: (a) at a very large scale; (b) at a fine-grained level of detail; and (c) with full history information. To address these challenges, we present domain-specific language features for source-code mining in our language and infrastructure called Boa. The goal of Boa is to ease testing MSR-related hypotheses. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also show drastic improvements in scalability.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories
    Dyer, Robert
    Hoan Anh Nguyen
    Rajan, Hridesh
    Nguyen, Tien N.
    PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), 2013, : 422 - 431
  • [2] On Accelerating Ultra-Large-Scale Mining
    Upadhyaya, Ganesha
    Rajan, Hridesh
    2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING TECHNOLOGIES RESULTS TRACK (ICSE-NIER), 2017, : 39 - 42
  • [3] Ultra-Large-Scale Repository Analysis via Graph Compression
    Boldi, Paolo
    Pietri, Antoine
    Vigna, Sebastiano
    Zacchiroli, Stefano
    PROCEEDINGS OF THE 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER '20), 2020, : 184 - 194
  • [4] A New Software Architecture for Ultra-large-scale Rendering Cloud
    Zhou Weini
    Lu Yongquan
    Gao Pengdong
    Qiu Chu
    Qi Quan
    2012 11TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING & SCIENCE (DCABES), 2012, : 196 - 199
  • [5] An ultra-large-scale simulation framework
    Rao, DM
    Wilsey, PA
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2002, 62 (11) : 1670 - 1693
  • [6] Ultra-large-scale syntheses of monodisperse nanocrystals
    Jongnam Park
    Kwangjin An
    Yosun Hwang
    Je-Geun Park
    Han-Jin Noh
    Jae-Young Kim
    Jae-Hoon Park
    Nong-Moon Hwang
    Taeghwan Hyeon
    Nature Materials, 2004, 3 : 891 - 895
  • [7] Ultra-large-scale syntheses of monodisperse nanocrystals
    Park, J
    An, KJ
    Hwang, YS
    Park, JG
    Noh, HJ
    Kim, JY
    Park, JH
    Hwang, NM
    Hyeon, T
    NATURE MATERIALS, 2004, 3 (12) : 891 - 895
  • [8] Ultra-Large-Scale Silicon Optical Switches
    Qiao, Lei
    Tang, Weijie
    Chu, Tao
    2016 IEEE 13TH INTERNATIONAL CONFERENCE ON GROUP IV PHOTONICS (GFP), 2016, : 1 - 2
  • [9] Type Migration in Ultra-Large-Scale Codebases
    Ketkar, Ameya
    Mesbah, Ali
    Mazinanian, Davood
    Dig, Danny
    Aftandilian, Edward
    2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, : 1142 - 1153
  • [10] Using intentional source-code views to aid software maintenance
    Mens, K
    Poll, B
    González, S
    INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 2003, : 169 - 178