ANNE: Improving Source Code Search using Entity Retrieval Approach

被引:16
作者
Vinayakarao, Venkatesh [1 ]
Sarma, Anita [2 ]
Purandare, Rahul [1 ]
Jain, Shuktika [1 ]
Jain, Saumya [1 ]
机构
[1] IIIT Delhi, New Delhi, India
[2] Oregon State Univ, Corvallis, OR 97331 USA
来源
WSDM'17: PROCEEDINGS OF THE TENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2017年
基金
美国国家科学基金会;
关键词
Code Search; Natural Language Processing; Information Retrieval; Assignment Grading;
D O I
10.1145/3018661.3018691
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Code search with natural language terms performs poorly because programming concepts do not always lexically match their syntactic forms. For example, in Java, the programming concept array does not match with its syntactic representation of [ ]. Code search engines can assist developers more effectively over natural language queries if such mappings existed for a variety of programming languages. In this work, we present a programming language agnostic technique to discover such mappings between syntactic forms and natural language terms representing programming concepts. We use the questions and answers in Stack Overflow to create this mapping. We implement our approach in a tool called ANNE. To evaluate its effectiveness, we conduct a user study in an academic setting in which teaching assistants use ANNE to search for code snippets in student submissions. With the use of ANNE, we find that the participants are 29% quicker with no significant drop in correctness and completeness.
引用
收藏
页码:211 / 220
页数:10
相关论文
共 31 条
  • [1] [Anonymous], 2005, P 43 ANN M ASS COMP, DOI DOI 10.3115/1219840.1219885
  • [2] [Anonymous], 2008, Introduction to information retrieval
  • [3] Managing Ambiguity in Programming by Finding Unambiguous Examples
    Arnold, Kenneth C.
    Lieberman, Henry
    [J]. ACM SIGPLAN NOTICES, 2010, 45 (10) : 877 - 884
  • [4] Bajracharya S., 2006, COMP 21 ACM SIGPLAN, P681, DOI DOI 10.1145/1176617.1176671
  • [5] Begel A., 2007, CODIFIER PROGRAMMER
  • [6] Ding XW, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P1125
  • [7] Downey D, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2733
  • [8] Gabel M, 2010, 18 ACM SIGSOFT INT S, P147
  • [9] Ghezzi G., 2010, Proc. 32nd ACM/IEEE Int'l Conf. Software Eng. - Vol. 1, V1, P165, DOI DOI 10.1145/1806799.1806827
  • [10] Incremental Record Linkage
    Gruenheid, Anja
    Dong, Xin Luna
    Srivastava, Divesh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (09): : 697 - 708