Analyzing and mining a code search engine usage log

被引:63
作者
Bajracharya, Sushil Krishna [1 ]
Lopes, Cristina Videira [1 ]
机构
[1] Univ Calif Irvine, Dept Informat, Donald Bren Sch Informat & Comp Sci, Irvine, CA 92697 USA
基金
美国国家科学基金会;
关键词
Code search engine; Usage log analysis; Mining topics; SOFTWARE DEVELOPERS; QUERY EXPANSION; WEB; INFORMATION; RETRIEVAL; SYSTEM;
D O I
10.1007/s10664-010-9144-6
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents an analysis of a year long usage log of Koders, the first commercially available Internet-Scale code search engine (http://www.koders.com). The usage log comprises about ten million activities from more than three million users. Analysis of the usage data shows that despite of attracting a large number of visitors, Koders has a very sparse usage and that it lacks regular usage from many of its users. When compared to Web search, search behavior in Koders showed many similar patterns. A topic modeling analysis of the usage data shows what topics users of Koders are looking for. Observations on the prevalence of these topics among the users, and observations on how search and download activities vary across topics, lead to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper also presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. It identifies various forms of queries in Koders's log and the kinds of results addressed by the queries. It also provides several suggestions for improvements in code search engines based on the analysis of usage, topics, and query forms. The work presented in this paper is the first of its kind that reveals several insights on the usage of an Internet-Scale code search engine.
引用
收藏
页码:424 / 466
页数:43
相关论文
共 63 条
[1]  
Andrzejewski D, 2007, LECT NOTES ARTIF INT, V4701, P6
[2]  
[Anonymous], 2002, P ACM SIGKDD KDD 200, DOI 10.1145/775047.775067
[3]  
[Anonymous], 2007, P 22 IEEE ACM INT C
[4]  
Asuncion H, 2010, 32 INT C SOFTW ENG
[5]   Modeling successful performance in Web searching [J].
Aula, Anne ;
Nordhausen, Klaus .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (12) :1678-1693
[6]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[7]  
Bajracharya S, 2010, 18 INT S FDN SOFTW E
[8]  
Bajracharya S, 2007, UCIISR078
[9]  
Bajracharya S., 2010, Proceedings of 2010 ICSE Workshop on Search-driven Development, P5
[10]  
Bajracharya S., 2006, COMP 21 ACM SIGPLAN, P681, DOI DOI 10.1145/1176617.1176671