A novel root based Arabic stemmer

被引:48
作者
Al-Kabi, Mohammed N.
Kazakzeh, Saif A. [2 ]
Abu Ata, Belal M. [2 ]
Al-Rababah, Saif A. [3 ]
Alsmadi, Izzat M. [1 ,4 ]
机构
[1] Zarqa Univ, Fac Sci & IT, Zarqa 13110, Jordan
[2] Yarmouk Univ, IT & CS Fac, CIS Dept, Irbid 21163, Jordan
[3] Al Albayt Univ, IT Fac, Dept Informat Syst, Mafraq, Jordan
[4] Boise State Univ, Dept Comp Sci, Boise, ID 83725 USA
关键词
Natural Language Processing (NLP); Computational intelligence; Stemming; Information retrieval;
D O I
10.1016/j.jksuci.2014.04.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stemming algorithms are used in information retrieval systems, indexers, text mining, text classifiers etc., to extract stems or roots of different words, so that words derived from the same stem or root are grouped together. Many stemming algorithms were built in different natural languages. Khoja stemmer is one of the known and widely used Arabic stemmers. In this paper, we introduced a new light and heavy Arabic stemmer. This new stemmer is presented in this study and compared with two well-known Arabic stemmers. Results showed that accuracy of our stemmer is slightly better than the accuracy yielded by each one of those two well-known Arabic stemmers used for evaluation and comparison. Evaluation tests on our novel stemmer yield 75.03% accuracy, while the other two Arabic stemmers yield slightly lower accuracy. (C) 2015 The Authors. Production and hosting by Elsevier B.V.
引用
收藏
页码:94 / 103
页数:10
相关论文
共 24 条
[1]  
Abu Ata B., 2014, J KING SAUD U COMPUT
[2]  
Abu-Salem H, 1999, J AM SOC INFORM SCI, V50, P524, DOI 10.1002/(SICI)1097-4571(1999)50:6<524::AID-ASI7>3.0.CO
[3]  
2-M
[4]  
AI-Sawadi A. D., 1996, J KING SAUD U COMPUT, V8, P21
[5]  
AI-Sawadi A. D., 1996, LECT NOTES COMPUT SC, V8, P21
[6]   A triliteral word roots extraction using neural network for Arabic [J].
Al-Serhan, Hasan ;
Ayesh, Aladdin .
2006 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2006, :436-+
[7]  
Al-Shalabi Riyad, 2007, Innovations'07. 4th International Conference on Innovations in Information Technology, P456
[8]  
Al-Shalabi R, 1998, COMPUTATIONAL APPROA, P58
[9]  
Al-Shammari E, 2008, P 2 ACM WORKSH IMPR, P9
[10]  
Al- Shammari E, 2008, P 2 WORKSHOP ANALYTI, P113