Mining software architecture knowledge: Classifying stack overflow posts using machine learning

被引:6
作者
Ali, Mubashir [1 ]
Mushtaq, Husnain [2 ]
Rasheed, Muhammad B. [3 ,4 ]
Baqir, Anees [5 ]
Alquthami, Thamer [6 ]
机构
[1] Univ Bergamo, DIGIP, Bergamo, Italy
[2] Univ Lahore, Dept Comp Sci, Gujrat, Pakistan
[3] Univ Alcala, Dept Comp Engn, Madrid 28801, Spain
[4] Univ Lahore, Dept Elect & Elect Syst, Lahore, Pakistan
[5] Ca Foscari Univ Venice, Dept Environm Sci Informat & Stat, Venice, Italy
[6] King Abdulaziz Univ, Dept Elect Engn & Comp Engn, Jeddah, Saudi Arabia
关键词
architectural knowledge management; stack overflow; crowd‐ sourced communities; text mining; classification;
D O I
10.1002/cpe.6277
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software Architectural Process (SAP) is a core and excessively knowledge intensive phase of software development life cycle, as it consumes and produces knowledge artifacts, simultaneously. SAP is about making design decisions, and the changes in these verdicts may pose adverse effects on software projects. The performance and properties of software components are fundamentally influenced by the design decisions. The implementation of immature and abrupt design decisions seriously threatens the development process of SAP. Moreover, software architectural knowledge management (AKM) approaches offer systematic ways to support SAP through versatile architectural solutions and design decisions. However, the majority of software organizations have limited access to data and still depend upon manually created and maintained AKM process. In this paper, we have utilized the one of the most prominent online community for software development (i.e., Stack Overflow) as a source of SAP knowledge to support AKM. In order to support AKM, we have proposed a supervised machine learning-based approach to classify the architectural knowledge into predefined categories, that is, analysis, synthesis, evaluation, and implementation. We have employed different combinations of feature selection technique to achieve the optimal classification results of the used classifiers (Support Vector Machine [SVM], K-Nearest Neighbor, Random Forest, and Naive Bayes [NB]). Among these classifiers, SVM with Uni-gram feature set provides best classification results and attains 85.80% accuracy. For evaluating the proposed approach's effectiveness, we have also computed the suitability of the classifiers, that is, the cost of computation along with its accuracy, and NB with Uni-gram feature set proved to be the most suitable.
引用
收藏
页数:17
相关论文
共 38 条
[1]   What Do Developers Use the Crowd For? A Study Using Stack Overflow [J].
Abdalkareem, Rabe ;
Shihab, Emad ;
Rilling, Juergen .
IEEE SOFTWARE, 2017, 34 (02) :53-60
[2]   Mining Duplicate Questions in Stack Overflow [J].
Ahasanuzzaman, Muhammad ;
Asaduzzaman, Muhammad ;
Roy, Chanchal K. ;
Schneider, Kevin A. .
13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), 2016, :402-412
[3]  
Ahmed K., 2016, J. Appl. Comput. Sci. Math., V10, P17, DOI [10.4316/jacsm.201601002, DOI 10.4316/JACSM.201601002]
[4]  
Al Rafi A, 2020, IEEE REGION 10 SYMP, P262
[5]  
Al-Naeem T, 2005, PROC INT CONF SOFTW, P244
[6]   Towards the Discovery of Influencers to Follow in Micro-Blogs (Twitter) by Detecting Topics in Posted Messages (Tweets) [J].
Ali, Mubashir ;
Baqir, Anees ;
Psaila, Giuseppe ;
Malik, Sayyam .
APPLIED SCIENCES-BASEL, 2020, 10 (16)
[7]  
[Anonymous], 2011, P 6 IB C INF SYST TE
[8]  
AVGERIOU P., 2007, SIGSOFT Softw. Eng. Notes, V32, P41
[9]  
Bandeira Alan, 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), P255, DOI 10.1109/MSR.2019.00051
[10]  
Bangash Abdul Ali, 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), P260, DOI 10.1109/MSR.2019.00052