Predicting the objective and priority of issue reports in software repositories

被引:39
|
作者
Izadi, Maliheh [1 ]
Akbari, Kiana [1 ]
Heydarnoori, Abbas [1 ]
机构
[1] Sharif Univ Technol, Intelligent Software Engn Lab, Tehran, Iran
关键词
Software evolution and maintenance; Mining software repositories; Issue reports; Classification; Prioritization; Machine learning; Natural language processing; INTERRATER RELIABILITY; KAPPA; CODE; COEFFICIENT; USAGE;
D O I
10.1007/s10664-021-10085-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% (fine-tuned RoBERTa) and 75% (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of 85.3% and Randolph's free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.
引用
收藏
页数:37
相关论文
共 50 条
  • [11] Exploring the software repositories of embedded systems: An industrial experience
    Polaczek, Jakub
    Sosnowski, Janusz
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 131
  • [12] Effective API Recommendation without Historical Software Repositories
    Liu, Xiaoyu
    Huang, LiGuo
    Ng, Vincent
    PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 282 - 292
  • [13] Raising the Quality of Bug Reports by Predicting Software Defect Indicators
    Gromova, Anna
    Itkin, Iosif
    Pavlov, Sergey
    Korovayev, Alexander
    2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 198 - 204
  • [14] Descriptions of issues and comments for predicting issue success in software projects
    Ramirez-Mora, Sandra L.
    Oktaba, Hanna
    Gomez-Adorno, Helena
    JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 168
  • [15] Software Process Simulation based on Mining Software Repositories
    Honsel, Verena
    Honsel, Daniel
    Grabowski, Jens
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 828 - 831
  • [16] Sentiment Analysis in Jira Software Repositories
    Valdez, Andric
    Oktaba, Hanna
    Gomez, Helena
    Vizcaino, Aurora
    2020 8TH EDITION OF THE INTERNATIONAL CONFERENCE IN SOFTWARE ENGINEERING RESEARCH AND INNOVATION (CONISOFT 2020), 2020, : 254 - 259
  • [17] A Virtual Assistant for Predicting Defective Software Module
    Gozuacik, Necip
    Parlak, Altan
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [18] Library adoption in public software repositories
    Krohn, Rachel
    Weninger, Tim
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [19] Library adoption in public software repositories
    Rachel Krohn
    Tim Weninger
    Journal of Big Data, 6
  • [20] Toward Deep Learning Software Repositories
    White, Martin
    Vendome, Christopher
    Linares-Vasquez, Mario
    Poshyvanyk, Denys
    12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, : 334 - 345