PATC: Parallel Arabic Text Classifier

被引:0
作者
Alshahrani, Mona [1 ]
Alkhalifa, Shurug [1 ]
机构
[1] King Saud Univ, Informat Technol Dept, Riyadh, Saudi Arabia
来源
2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC) | 2018年
关键词
component; Arabic Language; Multi-label Text Classification; MapReduce; Natural Language Processing; Text Mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of technology, the amount of textual data has dramatically grown and increased. It is also getting to be more complex in its nature every day. The ability to manage, analyze, summarize, and understand this data remains a challenging task that requires new techniques to deal with automatically organizing, searching, indexing, and browsing large collections of documents. Text classification is one of text mining areas, which is the process of classifying the text into predefined classes or topics. We developed a tool for Arabic text classification using parallel programming framework. The tool is called Parallel Arabic Text Classifier (PATC). It analyzes a labeled corpus of Arabic text that is input by the user and subsequently builds a text classifier. PATC consists of three major stages; (1) Preprocessing: PATC will normalize and stem the Arabic corpus before using it to train the classifier, (2) Training or Building the Classifier: The classifier will be trained with a user-uploaded, annotated Arabic corpus, and (3) Testing or Classifying: this stage will predict the class of a new document based on the trained classifier. This classifier is built using an approach that associates each label with frequent words using MapReduce distributed programming model. The classifier was evaluated using an Arabic corpus. The accuracy of the classification was around 80% using single-label measures, while it was in the high 90s% using multi-label measures.
引用
收藏
页数:7
相关论文
共 16 条
[1]  
Ahmed NA, 2015, 2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), P212, DOI 10.1109/IACS.2015.7103229
[2]  
Albalooshi N., 2011, 2011 6th International Conference for Internet Technology and Secured Transactions (ICITST), P378
[3]  
[Anonymous], LARGE SCALE ARABIC T
[4]  
[Anonymous], AR CORP BROWS FIL SO
[5]  
[Anonymous], THESIS
[6]  
[Anonymous], SURVEY TEXT CLASSIFI
[7]  
[Anonymous], J MACHINE LEARNING R
[8]  
Baghdadi Hossein Shahsavand, 2011, Journal of Computer Sciences, V7, P1363, DOI 10.3844/jcssp.2011.1363.1367
[9]  
Gharehchopogh F.S., 2011, 2011 5 INT C APPL IN, P1, DOI DOI 10.1109/ICAICT.2011.6111017
[10]  
Kao A., 2007, NATURAL LANGUAGE PRO