Using Language-Based Search in Mining Large Software Repositories

被引:1
作者
Abu Bakar, Normi Sham Awang [1 ]
机构
[1] Int Islamic Univ Malaysia, Kuala Lumpur 53100, Malaysia
来源
COMPUTATIONAL LINGUISTICS AND RELATED FIELDS | 2011年 / 27卷
关键词
Data retrieval; Software repository; Language - based search; Automation; Software quality;
D O I
10.1016/j.sbspro.2011.10.594
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Language component plays an important role in data/information retrieval. Data retrieval in software engineering is often hindered by the difficulty of getting data from commercial software. The emergence of the open source repositories has contributed tremendously in the collection of software data. This paper highlights the data retrieval method for mining software from a vast open source software repository, SourceForge. For the purpose of automating the data retrieval from the repository, a parser was written using the Python programming language, and based on the pattern matching algorithm. The retrieved data were later used to estimate the quality of the open source software. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of PACLING Organizing Committee.
引用
收藏
页码:160 / 168
页数:9
相关论文
共 11 条
[1]  
[Anonymous], 2005, PERSPECTIVES FREE OP, DOI DOI 10.2139/SSRN.443040
[2]  
[Anonymous], 1997, ACM SIGACT NEWS
[3]   MODEL OF LARGE PROGRAM-DEVELOPMENT [J].
BELADY, LA ;
LEHMAN, MM .
IBM SYSTEMS JOURNAL, 1976, 15 (03) :225-252
[4]   FAST STRING SEARCHING ALGORITHM [J].
BOYER, RS ;
MOORE, JS .
COMMUNICATIONS OF THE ACM, 1977, 20 (10) :762-772
[5]  
Briand LC, 2007, LECT NOTES COMPUT SC, V4336, P21
[6]  
Conte S.D., 1986, Software Engineering Metrics and Models
[7]  
HARRISON R, 1999, EMPIR SOFTW ENG, V4, P405
[8]  
Hassan A. E., 2008, ROAD AHEAD MINING SO, P48
[9]  
Lee J., 2008, 13 ENDE, P48
[10]  
O'Neil M., 2009, CYBERCHIEFS