CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues

被引:21
作者
Ahasanuzzaman, Md [1 ]
Asaduzzaman, Muhammad [1 ]
Roy, Chanchal K. [2 ]
Schneider, Kevin A. [2 ]
机构
[1] Queens Univ, Software Anal & Intelligence Lab SAIL, Kingston, ON, Canada
[2] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
关键词
API issue; Unstructured data mining; Text classification; Feature extraction; Stack Overflow;
D O I
10.1007/s10664-019-09743-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The design and maintenance of APIs (Application Programming Interfaces) are complex tasks due to the constantly changing requirements of their users. Despite the efforts of their designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), have become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features collected from posts, the experience of users, readability metrics and centrality measures of collaboration network to build a technique, called CAPS, that can classify SO posts concerning API issues. In total, we consider 34 features along eight different dimensions. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We then conduct studies to find important features and also evaluate the performance of the CRF-based technique for classifying issue sentences. Comparison with two other baseline approaches shows that the technique has high potential. We also test the generalizability of CAPS results, evaluate the effectiveness of different classifiers, and identify the impact of different feature sets.
引用
收藏
页码:1493 / 1532
页数:40
相关论文
共 66 条
[1]  
Aggarwal Karan, 2017, Journal of Software: Evolution and Process, V29, P3, DOI DOI 10.1002/SMR.1821
[2]  
Ahasanuzzaman M, 2018, 2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), P244, DOI 10.1109/SANER.2018.8330213
[3]  
Ahmed T, 2017, IEEE INT CONF AUTOM, P106, DOI 10.1109/ASE.2017.8115623
[4]  
Allison P.D., 2012, STAT HORIZONS, V2nd
[5]  
[Anonymous], 2002, MALLET: A Machine Learning for Language Toolkit
[6]  
[Anonymous], 2013, P 7 INT C LANG RES O
[7]  
Asaduzzaman M, 2013, IEEE WORK CONF MIN S, P97, DOI 10.1109/MSR.2013.6624015
[8]  
Bacchelli A., 2012, 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE), P26, DOI 10.1109/RSSE.2012.6233404
[9]  
Bacchelli A, 2012, PROC INT CONF SOFTW, P375, DOI 10.1109/ICSE.2012.6227177
[10]  
Bajaj K., 2014, P 11 WORKING C MININ, P112, DOI DOI 10.1145/2597073.2597083