Predicting the objective and priority of issue reports in software repositories

被引:39
|
作者
Izadi, Maliheh [1 ]
Akbari, Kiana [1 ]
Heydarnoori, Abbas [1 ]
机构
[1] Sharif Univ Technol, Intelligent Software Engn Lab, Tehran, Iran
关键词
Software evolution and maintenance; Mining software repositories; Issue reports; Classification; Prioritization; Machine learning; Natural language processing; INTERRATER RELIABILITY; KAPPA; CODE; COEFFICIENT; USAGE;
D O I
10.1007/s10664-021-10085-3
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% (fine-tuned RoBERTa) and 75% (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of 85.3% and Randolph's free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.
引用
收藏
页数:37
相关论文
共 50 条
  • [21] Analyzing Networks of Issue Reports
    Borg, Markus
    Pfahl, Dietmar
    Runeson, Per
    PROCEEDINGS OF THE 17TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR 2013), 2013, : 79 - 88
  • [22] Using health data repositories for developing clinical system software: a multi-objective fuzzy genetic approach
    Raja, Bilal S.
    Asghar, Sohail
    IET SOFTWARE, 2020, 14 (03) : 254 - 264
  • [23] Mining software repositories for comprehensible software fault prediction models
    Vandecruys, Olivier
    Martens, David
    Baesens, Bart
    Mues, Christophe
    De Backer, Manu
    Haesen, Raf
    JOURNAL OF SYSTEMS AND SOFTWARE, 2008, 81 (05) : 823 - 839
  • [24] An empirical study of automated privacy requirements classification in issue reports
    Sangaroonsilp, Pattaraporn
    Choetkiertikul, Morakot
    Dam, Hoa Khanh
    Ghose, Aditya
    AUTOMATED SOFTWARE ENGINEERING, 2023, 30 (02)
  • [25] An empirical study of automated privacy requirements classification in issue reports
    Pattaraporn Sangaroonsilp
    Morakot Choetkiertikul
    Hoa Khanh Dam
    Aditya Ghose
    Automated Software Engineering, 2023, 30
  • [26] Predicting Software Maintenance Effort by Mining Software Project Reports Using Inter-Version Validation
    Jindal, Rajni
    Malhotra, Ruchika
    Jain, Abha
    INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2016, 23 (06)
  • [27] Extracting enhanced artificial intelligence model metadata from software repositories
    Tsay, Jason
    Braz, Alan
    Hirzel, Martin
    Shinnar, Avraham
    Mummert, Todd
    EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (07)
  • [28] Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study
    Kuutila, Miikka
    Mantyla, Mika
    Claes, Maelick
    Elovainio, Marko
    Adams, Bram
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (05)
  • [29] Extracting enhanced artificial intelligence model metadata from software repositories
    Jason Tsay
    Alan Braz
    Martin Hirzel
    Avraham Shinnar
    Todd Mummert
    Empirical Software Engineering, 2022, 27
  • [30] Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study
    Miikka Kuutila
    Mika Mäntylä
    Maëlick Claes
    Marko Elovainio
    Bram Adams
    Empirical Software Engineering, 2021, 26