Sourcerer: An infrastructure for large-scale collection and analysis of open-source code

被引:55
作者
Bajracharya, Sushi [1 ]
Ossher, Joel [1 ]
Lopes, Cristina [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
关键词
Open source; Internet-scale code retrieval; Data mining; Sourcerer; Static analysis; Software information retrieval; SOFTWARE; SEARCH; REUSE;
D O I
10.1016/j.scico.2012.04.008
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-ofthe-art software engineering tools, as these tools often require access to both the structural and textual information available in source code. We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-ofthe-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:241 / 259
页数:19
相关论文
共 59 条
[1]  
[Anonymous], 2010, BLACK DUCKS WEB PAGE
[2]  
[Anonymous], 2010, WEB SITE FOR MEROBAS
[3]  
[Anonymous], 2010, WEB SITE FOR MAVEN
[4]  
[Anonymous], 2010, WEB SITE FOR KRUGLE
[5]  
[Anonymous], 2010, WEB PAGE ON APACHE L
[6]  
[Anonymous], 2010, LUCID IMAGINATION LU
[7]  
[Anonymous], 2008, Introduction to information retrieval
[8]  
[Anonymous], 2010, WEB SITE FOR GOOGLE
[9]  
[Anonymous], 2010, WEB PAGE FOR SOURCER
[10]  
[Anonymous], 2010, WEBSITE FOR SOLR