Optimized Machine Learning Input for Evolutionary Source Code to Architecture Mapping

被引:0
作者
Olsson, Tobias [1 ]
Ericsson, Morgan [1 ]
Wingkvist, Anna [1 ]
机构
[1] Linnaeus Univ, Dept Comp Sci & Media Technol, Kalmar Vaxjo, Sweden
来源
SOFTWARE ARCHITECTURE. ECSA 2022 TRACKS AND WORKSHOPS | 2023年 / 13928卷
关键词
Orphan Adoption; Software Architecture; Clustering;
D O I
10.1007/978-3-031-36889-9_28
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatically mapping source code to architectural modules is an interesting and difficult problem. Mapping can be considered a classification problem, and machine learning approaches have been used to automatically generate mappings. Feature engineering is an essential element of machine learning. We study which source code features are important for an algorithm to function effectively. Additionally, we examine stemming and data cleaning. We systematically evaluate various combinations of features on five datasets created from JabRef, TeamMates, ProM, and two Hadoop subsystems. The systems are open-source with well-established mappings. We find that no single set of features consistently provides the highest performance, and even the subsystems of Hadoop have varied optimal feature combinations. Stemming provided minimal benefit, and cleaning the data is not worth the effort, as it also provided minimal benefit.
引用
收藏
页码:421 / 435
页数:15
相关论文
共 14 条
[1]   Architecture consistency: State of the practice, challenges and requirements [J].
Ali, Nour ;
Baker, Sean ;
O'Crowley, Ross ;
Herold, Sebastian ;
Buckley, Jim .
EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (01) :224-258
[2]   SUPERVISED LEARNING FOR ORPHAN ADOPTION PROBLEM IN SOFTWARE ARCHITECTURE RECOVERY [J].
Bibi, Maryum ;
Maqbool, Onaiza ;
Kanwal, Jaweria .
MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2016, 29 (04) :287-313
[3]  
Bittencourt R. A., 2010, Proceedings 17th Working Conference on Reverse Engineering (WCRE 2010), P163, DOI 10.1109/WCRE.2010.26
[4]   An improved mapping method for automated consistency check between software architecture and source code [J].
Chen, Fangwei ;
Zhang, Li ;
Lian, Xiaoli .
2020 IEEE 20TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY (QRS 2020), 2020, :60-71
[5]  
Christl A, 2005, WCRE: 12TH WORKING CONFERENCE ON REVERSE ENGINEERING 2005, PROCEEDINGS, P89
[6]   Automated clustering to support the reflexion method [J].
Christl, Andreas ;
Koschke, Rainer ;
Storey, Margaret-Anne .
INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (03) :255-274
[7]  
Garcia J, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P901, DOI 10.1109/ICSE.2013.6606639
[8]  
Hattori Lile P., 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops, P63, DOI 10.1109/ASEW.2008.4686322
[9]  
Olsson T., 2021, LNCS, P13
[10]   To automatically map source code entities to architectural modules with Naive Bayes [J].
Olsson, Tobias ;
Ericsson, Morgan ;
Wingkvist, Anna .
JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 183