Subsequence Kernels-Based Arabic Text Classification

被引:0
作者
Nehar, Attia [2 ]
Benmessaoud, Abdelkader [1 ]
Cherroun, Hadda [1 ]
Ziadi, Djelloul [3 ]
机构
[1] Univ Amar Telidji, Lab Informat & Math, Laghouat, Algeria
[2] Univ Ziane Achour, Djelfa, Algeria
[3] Normandie Univ, Lab LITIS, EA 4108, Rouen, France
来源
2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA) | 2014年
关键词
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Kernel methods have known huge success in machine learning. This success is mainly due to their flexibility to deal with high dimensionality of the feature space of complex data such as graphs, trees or textual data. In the field of text classification (TC) their performances have supplanted traditional algorithms. For textual data, different kernels were introduced (P-spectrum, AII-Sub-sequences, Gap-Weighted Subsequences kernel,...) to improve the performance of TC systems. In this paper, we carried out a system for Arabic TC which supports aspects of order and co-occurrence of words within a text. Transducers, specific automata, are used to represent documents. Such representation allows an efficient implementation of subsequence kernel. An empirical study is conducted to evaluate the ATC system on the large SPA corpus. Results show an improvement of the classification in terms of precision.
引用
收藏
页码:206 / 213
页数:8
相关论文
共 21 条
  • [1] Allauzen C, 2007, LECT NOTES COMPUT SC, V4783, P11
  • [2] Alsaleem S, 2011, INT ARAB J E TECHNOL
  • [3] Althubaity A., 2008, KACST ARABIC TEXT CL
  • [4] [Anonymous], 2004, KERNEL METHODS PATTE
  • [5] [Anonymous], 1998, EUR C MACH LEARN
  • [6] [Anonymous], 1979, Transductions and context-free languages
  • [7] Cortes C, 2004, J MACH LEARN RES, V5, P1035
  • [8] Learning with Weighted Transducers
    Cortes, Corinna
    Mohri, Mehryar
    [J]. FINITE-STATE METHODS AND NATURAL LANGUAGE PROCESSING, 2009, 191 : 14 - +
  • [9] Duwairi R., 2007, INT ARAB J INF TECHN, V4, P125
  • [10] El Kourdi Mohamed., 2004, Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, P51