Automatic query reformulation for code search using crowdsourced knowledge

被引:36
作者
Rahman, Mohammad M. [1 ]
Roy, Chanchal K. [2 ]
Lo, David [3 ]
机构
[1] Univ Saskatchewan, Saskatoon, SK, Canada
[2] Univ Saskatchewan, Software Engn Comp Sci, Saskatoon, SK, Canada
[3] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
基金
加拿大自然科学与工程研究理事会;
关键词
Code search; Query reformulation; Keyword-API association; Crowdsourced knowledge; Stack Overflow;
D O I
10.1007/s10664-018-9671-0
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique-RACK-that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.
引用
收藏
页码:1869 / 1924
页数:56
相关论文
共 72 条
[1]  
[Anonymous], 2010, P 32 ACM IEEE INT C
[2]  
[Anonymous], 2016, T ASSOC COMPUT LING, DOI DOI 10.1162/TACL_A_00051
[3]  
[Anonymous], 1971, SMART RETRIEVAL SYST
[4]  
Arong, 2014, PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), P51, DOI 10.1109/PIC.2014.6972294
[5]  
ASMATULU R, 2018, P ASM INT MECH ENG
[6]   Analyzing and mining a code search engine usage log [J].
Bajracharya, Sushil Krishna ;
Lopes, Cristina Videira .
EMPIRICAL SOFTWARE ENGINEERING, 2012, 17 (4-5) :424-466
[7]  
Brandt J, 2009, CHI2009: PROCEEDINGS OF THE 27TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1589
[8]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[9]   NLP2Code: Code Snippet Content Assist via Natural Language Tasks [J].
Campbell, Brock Angus ;
Treude, Christoph .
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, :628-632
[10]   Improving IR-based traceability recovery via noun-based indexing of software artifacts [J].
Capobianco, Giovanni ;
De Lucia, Andrea ;
Oliveto, Rocco ;
Panichella, Annibale ;
Panichella, Sebastiano .
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2013, 25 (07) :743-762