Boa: Ultra-Large-Scale Software Repository and Source-Code Mining

被引:58
|
作者
Dyer, Robert [1 ]
Hoan Anh Nguyen [2 ]
Rajan, Hridesh [2 ]
Nguyen, Tien N. [2 ]
机构
[1] Bowling Green State Univ, Bowling Green, OH 43403 USA
[2] Iowa State Univ, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Boa; mining software repositories; domain-specific language; scalable; ease of use; lower barrier to entry;
D O I
10.1145/2803171
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In today's software-centric world, ultra-large-scale software repositories, such as SourceForge, GitHub, and Google Code, are the new library of Alexandria. They contain an enormous corpus of software and related information. Scientists and engineers alike are interested in analyzing this wealth of information. However, systematic extraction and analysis of relevant data from these repositories for testing hypotheses is hard, and best left for mining software repository (MSR) experts! Specifically, mining source code yields significant insights into software development artifacts and processes. Unfortunately, mining source code at a large scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse grained, or sacrifice studying the history of the code. In this article we address mining source code: (a) at a very large scale; (b) at a fine-grained level of detail; and (c) with full history information. To address these challenges, we present domain-specific language features for source-code mining in our language and infrastructure called Boa. The goal of Boa is to ease testing MSR-related hypotheses. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also show drastic improvements in scalability.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Diggit: Automated Code Review via Software Repository Mining
    Chatley, Robert
    Jones, Lawrence
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), 2018, : 567 - 571
  • [22] Evaluating the Lifespan of Code Smells using Software Repository Mining
    Peters, Ralph
    Zaidman, Andy
    2012 16TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2012, : 411 - 416
  • [23] System and software architecting harmonization practices in ultra-large-scale systems of systems: A confirmatory case study
    Cadavid, Hector
    Andrikopoulos, Vasilios
    Avgeriou, Paris
    Broekema, P. Chris
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 150
  • [24] SHALLOW TRENCH ISOLATION FOR ULTRA-LARGE-SCALE INTEGRATED DEVICES
    BLUMENSTOCK, K
    THEISEN, J
    PAN, P
    DULAK, J
    TICKNOR, A
    SANDWICK, T
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY B, 1994, 12 (01): : 54 - 58
  • [25] Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer
    Hasegawa, Yukihiro
    Iwata, Jun-Ichi
    Tsuji, Miwako
    Takahashi, Daisuke
    Oshiyama, Atsushi
    Minami, Kazuo
    Boku, Taisuke
    Inoue, Hikaru
    Kitazawa, Yoshito
    Miyoshi, Ikuo
    Yokokawa, Mitsuo
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2014, 28 (03): : 335 - 355
  • [26] A source-code based extraction way for micro processes influencing software complexity
    Hanakawa, Noriko
    APSEC 2008:15TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 2008, : 239 - 246
  • [27] Creating and Analyzing Source Code Repository Models A Model-based Approach to Mining Software Repositories
    Scheidgen, Markus
    Smidt, Martin
    Fischer, Joachim
    MODELSWARD: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING AND SOFTWARE DEVELOPMENT, 2017, : 329 - 336
  • [28] Improving Source-Code Representations to Enhance Search-based Software Repair
    Reiter, Pemma
    Espinoza, Antonio M.
    Doupe, Adam
    Wang, Ruoyu
    Weimer, Westley
    Arizona, Stephanie Forrest
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 1336 - 1344
  • [29] Accuracy control in ultra-large-scale electronic structure calculations
    Hoshi, T.
    JOURNAL OF PHYSICS-CONDENSED MATTER, 2007, 19 (36)
  • [30] Sextant: A Tool to Specify and Visualize Software Metrics for Java']Java Source-Code
    Winter, Victor
    Reinke, Carl
    Guerrero, Jonathan
    2013 4TH INTERNATIONAL WORKSHOP ON EMERGING TRENDS IN SOFTWARE METRICS (WETSOM), 2013, : 49 - 55