Stylometric analysis of classical Arabic texts for genre detection

被引:8
作者
Al-Yahya, Maha [1 ]
机构
[1] King Saud Univ, Dept Informat Technol, Riyadh, Saudi Arabia
关键词
Stylometric analysis; Genre detection; Classical arabic text; Distance measure; AUTHORSHIP ATTRIBUTION;
D O I
10.1108/EL-11-2017-0236
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized retrieval. The purpose of this study is to explore and evaluate the use of stylometric analysis, a quantitative analysis for the linguistics features of text, to support the task of automated text genre detection for Classical Arabic text. Design/methodology/approach Unsupervised clustering and supervised classification were applied on the King Saud University Corpus of Classical Arabic texts (KSUCCA) using the most frequent words in the corpus (MFWs) as stylometric features. Four popular distance measures established in stylometric research are evaluated for the genre detection task. Findings The results of the experiments show that stylometry-based genre clustering and classification align well with human-defined genre. The evidence suggests that genre style signals exist for Classical Arabic and can be used to support the task of automated genre detection. Originality/value This work targets the task of genre detection in Classical Arabic text using stylometric features, an approach that has only been previously applied to Arabic authorship attribution. The study also provides a comparison of four distance measures used in stylomtreic analysis on the KSUCCA, a corpus with over 50 million words of Classical Arabic using clustering and classification.
引用
收藏
页码:842 / 855
页数:14
相关论文
共 53 条
  • [1] Applying authorship analysis to extremist-group web forum messages
    Abbasi, A
    Chen, HC
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) : 67 - 75
  • [2] Abuhaiba Ibrahim S. I., 2016, International Journal of Intelligent Systems and Applications, V8, P27, DOI 10.5815/ijisa.2016.06.04
  • [3] Detecting Hoaxes, Frauds, and Deception in Writing Style Online
    Afroz, Sadia
    Brennan, Michael
    Greenstadt, Rachel
    [J]. 2012 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2012, : 461 - 475
  • [4] Aggarwal CharuC., 2012, MINING TEXT DATA, DOI 10.1007/978-1-4614-3223-4_6
  • [5] Al-Ayyoub M, 2017, INT J WEB INF SYST, V13, P85, DOI 10.1108/IJWIS-03-2016-0011
  • [6] Arabic morphological analysis techniques: A comprehensive survey
    Al-Sughaiyer, IA
    Al-Kharashi, IA
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (03): : 189 - 213
  • [7] Alrabiah M., 2014, Int. J. Comput. Linguist, V5, P27
  • [8] Naive Bayes classifiers for authorship attribution of Arabic texts
    Altheneyan, Alaa Saleh
    Menai, Mohamed El Bachir
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2014, 26 (04) : 473 - 484
  • [9] Andersen J, 2008, ANNU REV INFORM SCI, V42, P339
  • [10] [Anonymous], GENRE CLASSIFICATION