New labeled dataset of interconnected lexical typos for automatic correction in the bug reports

被引:0
作者
Behzad Soleimani Neysiani
Seyed Morteza Babamir
机构
[1] University of Kashan,
来源
SN Applied Sciences | 2019年 / 1卷
关键词
Natural language processing; Typo correction; Interconnected lexical typo; Tree structure; Bug reports; 68T50; 68T20; 68U15; 68P05; 68P10; 68P20; 68P30; 94A13; 68Q25; 68R15; 68W10; 68W32; 68W40; 05C05; C88; L86; D81; D83; L17; Z13;
D O I
暂无
中图分类号
学科分类号
摘要
Large-scale and especially open-source projects use software triage systems like Bugzilla to manage their user’s requests like bugs, suggestions, and requirements. The software triage systems have many tasks like prioritizing, finding duplicate and assigning bug reports to developers automatically, which needs text mining, information retrieval, and natural language processing techniques. We already showed there are many typos in the bug reports which reduce the performance of artificial intelligence techniques. The connected terms were one of the most types of typos in the context of bug reports. Also, we introduce some algorithms to correct the connected terms earlier, but there was not any labeled dataset that can be used to evaluate the accuracy of process of typo correction. Now we made a new labeled dataset including 42,970 typos between 182,096 to can be used for the typo correction evaluation process. There are 52% connected typos in the labeled dataset, which show the previous results about the number of connected typos were correct. Then we used the typo correction algorithms which were introduced in prior studies to evaluate their accuracy. The experimental results show 81.6% and 83.3% accuracy in top-5 and top-10 suggestions of the list of typo corrections, respectively.
引用
收藏
相关论文
共 17 条
[1]  
Kukich K(1992)Techniques for automatically correcting words in text ACM Comput Surv (CSUR) 24 377-439
[2]  
Lai KH(2015)Automated misspelling detection and correction in clinical free-text records J Biomed Inform 55 188-195
[3]  
Topaz M(2016)Automatic Arabic spelling errors detection and correction based on confusion matrix-noisy channel hybrid system Egypt Comput Sci J 40 54-64
[4]  
Goss FR(2015)Intelligent typo correction for text mining through machine learning Int J Knowl Eng Data Min 3 115-142
[5]  
Zhou L(1996)Tries for approximate string matching IEEE Trans Knowl Data Eng 8 540-547
[6]  
Noaman HM(2005)Correcting spelling errors by modeling their causes Int J Appl Math Comput Sci 15 275-285
[7]  
Sarhan SS(2015)Type-ahead exploratory search through typo and word order tolerant autocompletion J Web Eng 14 80-116
[8]  
Rashwan M(undefined)undefined undefined undefined undefined-undefined
[9]  
Huang Y(undefined)undefined undefined undefined undefined-undefined
[10]  
Murphey YL(undefined)undefined undefined undefined undefined-undefined