Feed-forward and recurrent neural networks for source code informal information analysis

被引:15
作者
Merlo, E
McAdam, I
De Mori, R
机构
[1] Ecole Polytech, Dept Genie Informat, Montreal, PQ H3C 3A7, Canada
[2] Calidris, IS-108 Reykjavik, Iceland
[3] Univ Avignon, Ctr Enseignement & Rech Informat, Lab Informat, F-84911 Avignon 9, France
来源
JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE | 2003年 / 15卷 / 04期
关键词
feed-forward networks; recurrent neural networks; design recovery; program understanding; informal information analysis;
D O I
10.1002/smr.274
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Design recovery, which is a part of the reverse engineering process of source code, must supply programmers with all the information they need to fully understand a program or a system. In this paper, a connectionist method that can be used for design recovery in conjunction with more traditional approaches is proposed for analyzing the informal information (comments and mnemonics) in programs. An approach based on artificial neural networks (ANNs) was chosen because of its property of being robust (capable of tolerating noisy inputs), because of its associative memory ability (capable of retrieving a concept given only the context of the input word that originally fired the concept), and because of its generalization power (ability to learn conceptually relevant micro-features of the domain). The proposed approach uses a combination of top down domain analysis (i.e., the creation of a concept hierarchy by a domain expert, to be used in the construction of the training set) and a bottom up approach (i.e., the analysis of the informal information using ANNs). A preprocessing system that extracts the relevant comments and identifier names and transforms them into an input for the ANNs has been developed. Feed-forward neural networks (FNNs) and recurrent neural networks (RNNs) were tried. RNN architectures are capable of learning sequences and are able to make use of the word ordering of the sentence. The networks were trained on part of the source code of an existing system and tested on a different portion of the system code. Test results, consisting of coverage and evaluation figures, are presented. They show a remarkably higher accuracy when ANNs, in general, are used as opposed to simple lexical methods. RNNs, in particular, also show higher coverage and accuracy than FNNs. Copyright (C) 2003 John Wiley Sons, Ltd.
引用
收藏
页码:205 / 244
页数:40
相关论文
共 47 条
[1]  
Anquetil N, 1999, J SOFTW MAINT-RES PR, V11, P201, DOI 10.1002/(SICI)1096-908X(199905/06)11:3<201::AID-SMR192>3.0.CO
[2]  
2-1
[3]   Extracting concepts from file names; a new file clustering criterion [J].
Anquetil, N ;
Lethbridge, T .
PROCEEDINGS OF THE 1998 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, 1998, :84-93
[4]  
ANQUETIL N, 1997, CASCON 97, P184
[5]  
Anquetil N., 1998, P ANN IBM CTR ADV ST, P213
[6]   Tracing object-oriented code into functional requirements [J].
Antoniol, G ;
Canfora, G ;
Casazza, G ;
De Lucia, A ;
Merlo, E .
8TH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION (IWPC 2000), PROCEEDINGS, 2000, :79-86
[7]   GLOBAL OPTIMIZATION OF A NEURAL NETWORK-HIDDEN MARKOV MODEL HYBRID [J].
BENGIO, Y ;
DEMORI, R ;
FLAMMIA, G ;
KOMPE, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :252-259
[8]  
BENGIO Y, 1989, INT JOINT C NEUR NET, V2, P417
[9]   DESIGN RECOVERY FOR MAINTENANCE AND REUSE [J].
BIGGERSTAFF, TJ .
COMPUTER, 1989, 22 (07) :36-49
[10]  
BIGGERSTAFF TJ, 1993, PROC INT CONF SOFTW, P482, DOI 10.1109/ICSE.1993.346017