Applying short text topic models to instant messaging communication of software developers

被引:2
作者
Silva, Camila Costa [1 ]
Galster, Matthias [1 ]
Gilson, Fabian [1 ]
机构
[1] Univ Canterbury, Christchurch, New Zealand
关键词
Short text topic modeling; Instant messaging; Topic coherence; Word intrusion; Topic intrusion; Topic naming; CLASSIFIER CONFIGURATION; STAR RATINGS; CONSISTENCY; REVIEWS; IMPACT;
D O I
10.1016/j.jss.2024.112111
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
When modeling topics from chat messages of developer instant messaging communication, individual chat messages are short text documents. Our study aims at understanding how short text topic models perform with conversations from developer instant messaging. We applied four models to nine Gitter chat rooms (with sizes ranging from approximate to 100 to approximate to 160,000 messages). To assess the quality of topics and identify the best performing models, we compared topics based on four metrics for topic coherence. Furthermore, for a subset of Gitter chat rooms we used two human-based assessments: intrusion tasks with 18 experts analyzing 40 topics each, and topic naming (assigning a name to a topic that summarizes its main concept) with eight additional experts naming 60 topics each. Models performed differently in terms of coherence metrics and human assessment depending on the corpus (small, medium or large chat room). Our findings offer recommendations for the selection and use of short text topic models with developer chat messages based on characteristics of models and their performance with different sizes of corpora, and based on different strategies to assess topic quality.
引用
收藏
页数:29
相关论文
共 89 条
[1]   Challenges in Chatbot Development: A Study of Stack Overflow Posts [J].
Abdellatif, Ahmad ;
Costa, Diego ;
Badran, Khaled ;
Abdalkareem, Rabe ;
Shihab, Emad .
2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, :174-185
[2]   What is wrong with topic modeling? And how to fix it using search-based software engineering [J].
Agrawal, Amritanshu ;
Fu, Wei ;
Menzies, Tim .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 98 :74-88
[3]   CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues [J].
Ahasanuzzaman, Md ;
Asaduzzaman, Muhammad ;
Roy, Chanchal K. ;
Schneider, Kevin A. .
EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (02) :1493-1532
[4]   What Do Concurrency Developers Ask About? A Large-scale Study Using Stack Overflow [J].
Ahmed, Syed ;
Bagherzadeh, Mehdi .
PROCEEDINGS OF THE 12TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2018), 2018,
[5]  
Aletras N., 2013, P 10 INT C COMP SEM, P13
[6]  
Allahyari M, 2017, INT J ADV COMPUT SC, V8, P397, DOI 10.14569/IJACSA.2017.081052
[7]  
[Anonymous], 2013, NeurIPS, DOI DOI 10.48550/ARXIV.1310.4546
[8]  
[Anonymous], PEARSON CORRELATION
[9]  
[Anonymous], 2018, SPEARMANS RANK ORDER
[10]   Going Big: A Large-Scale Study on What Big Data Developers Ask [J].
Bagherzadeh, Mehdi ;
Khatchadourian, Raffi .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :432-442