Supporting concept location through identifier parsing and ontology extraction

被引:9
作者
Abebe, Surafel Lemma [1 ]
Alicante, Anita [2 ]
Corazza, Anna [2 ]
Tonella, Paolo [1 ]
机构
[1] Fdn Bruno Kessler, Trento, Italy
[2] Univ Naples Federico II, Naples, Italy
关键词
Program understanding; Concept location; Natural language parsing; SOURCE CODE; EVOLUTION;
D O I
10.1016/j.jss.2013.07.009
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Identifier names play a key role in program understanding and in particular in concept location. Programmers can easily "parse" identifiers and understand the intended meaning. This, however, is not trivial for tools that try to exploit the information in the identifiers to support program understanding. To address this problem, we resort to natural language analyzers, which parse tokenized identifier names and provide the syntactic relationships (dependencies) among the terms composing the identifiers. Such relationships are then mapped to semantic relationships. In this study, we have evaluated the use of off-the-shelf and trained natural language analyzers to parse identifier names, extract an ontology and use it to support concept location. In the evaluation, we assessed whether the concepts taken from the ontology can be used to improve the efficiency of queries used in concept location. We have also investigated if the use of different natural language analyzers has an impact on the ontology extracted and the support it provides to concept location. Results show that using the concepts from the ontology significantly improves the efficiency of concept location queries (e.g., in some cases, an improvement of 127% is observed). The results also indicate that the efficiency of concept location queries is not affected by the differences in the ontologies produced by different analyzers. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:2919 / 2938
页数:20
相关论文
共 45 条
  • [1] Abebe S. L., 2011, 2011 18th Working Conference on Reverse Engineering, P77, DOI 10.1109/WCRE.2011.19
  • [2] Abebe Surafel Lemma, 2010, Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC 2010), P156, DOI 10.1109/ICPC.2010.29
  • [3] Analyzing the Evolution of the Source Code Vocabulary
    Abebe, Surafel Lemma
    Haiduc, Sonia
    Marcus, Andrian
    Tonella, Paolo
    Antoniol, Giuliano
    [J]. 13TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: CSMR 2009, PROCEEDINGS, 2009, : 189 - 198
  • [4] [Anonymous], 2003, P 8 INT C PARSING TE
  • [5] [Anonymous], 1993, COMPUT LINGUIST, DOI DOI 10.21236/ADA273556
  • [6] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [7] BIGGERSTAFF TJ, 1993, PROC INT CONF SOFTW, P482, DOI 10.1109/ICSE.1993.346017
  • [8] Binkley David W., 2011, Proceedings of the 8th Working Conference on Mining Software Repositories, P203, DOI 10.1145
  • [9] Butler S., 2011, 2011 IEEE 27th International Conference on Software Maintenance, P93, DOI 10.1109/ICSM.2011.6080776
  • [10] ClaesWohlin Per Runeson, 2012, Experimentation in Software Engineering