Analysis and Detection of Information Types of Open Source Software Issue Discussions

被引:57
作者
Arya, Deeksha [1 ]
Wang, Wenting [1 ]
Guo, Jin L. C. [1 ]
Cheng, Jinghui [2 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
[2] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ, Canada
来源
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019) | 2019年
基金
加拿大自然科学与工程研究理事会;
关键词
collaborative software engineering; issue tracking system; issue discussion analysis; AGREEMENT;
D O I
10.1109/ICSE.2019.00058
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.
引用
收藏
页码:454 / 464
页数:11
相关论文
共 29 条
[1]  
Adler BT, 2011, LECT NOTES COMPUT SC, V6609, P277, DOI 10.1007/978-3-642-19437-5_23
[2]   What is wrong with topic modeling? And how to fix it using search-based software engineering [J].
Agrawal, Amritanshu ;
Fu, Wei ;
Menzies, Tim .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 98 :74-88
[3]   Rationale in Development Chat Messages: An Exploratory Study [J].
Alkadhi, Rana ;
Lata, Teodora ;
Guzmany, Emitza ;
Bruegge, Bernd .
2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, :436-446
[4]  
Bertram D, 2010, 2010 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK, P291
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]   Emotional intelligence can make a difference: The impact of principals' emotional intelligence on teaching strategy mediated by instructional leadership [J].
Chen, Junjun ;
Guo, Wei .
EDUCATIONAL MANAGEMENT ADMINISTRATION & LEADERSHIP, 2020, 48 (01) :82-105
[7]   A survey on the use of topic models when mining software repositories [J].
Chen, Tse-Hsun ;
Thomas, Stephen W. ;
Hassan, Ahmed E. .
EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (05) :1843-1919
[8]  
Christophe F, 2012, PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2011, VOL 9, P17
[9]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[10]  
Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670