Multi-Lingual Author Identification and Linguistic Feature Extraction - a Machine Learning Approach

被引:0
作者
Alam, Hassan [1 ]
Kumar, Aman [1 ]
机构
[1] BCL Technol, San Jose, CA 95128 USA
来源
2013 IEEE INTERNATIONAL CONFERENCE ON TECHNOLOGIES FOR HOMELAND SECURITY (HST) | 2013年
关键词
Author Identification; Semantic Features; Feature Extraction; NLP; Support Vector Machine; MESSAGES;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Internet based services have emerged as one of the most effective platform to express and exchange views. Most of these services allow anonymous postings. Lately, it has been observed that anonymous postings responsible to instigate violence or cause panic. Some studies have been made to identify authors for such blogs, mostly target to English postings. Current author identification systems do not employ rich morphological features for languages such as Arabic (Modern Standard Arabic). In this study we develop a novel semantic feature to aid author identification system for Arabic. To completely exploit rich morphology of Arabic, we used parse tree intelligently as features. The overall approach uses language-specific NLP parsers, lexicons, semantic processing, thematic role assignment, semantic heuristics, and machine learning techniques to rapidly train systems for the subtleties mentioned above for Arabic. Our system identifies authors on the basis of stylistic and linguistic similarities between the author's existing works and the unidentified text in the form of online blogs and articles. We use support vector machine (SVM) to identify authors based on these novel features. Our approach yields accuracy of 98% in law and order and terrorism related Arabic blogs.
引用
收藏
页码:386 / 389
页数:4
相关论文
共 15 条
  • [1] Applying authorship analysis to extremist-group web forum messages
    Abbasi, A
    Chen, HC
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) : 67 - 75
  • [2] [Anonymous], COMPUTATIONAL LINGUI
  • [3] Benajima Y., AMIRA ARABIC TEXT TO
  • [4] A tutorial on Support Vector Machines for pattern recognition
    Burges, CJC
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) : 121 - 167
  • [5] Diab M., SEMANTIC ROLE LABELI
  • [6] Diab Mona, 2007, P 4 INT WORKSH SEM E, P133
  • [7] Green S., 2010, COLING 2010
  • [8] Joachims T., SVMlight
  • [9] Moschitti A., TREE KERNELS SVM LIG
  • [10] Moschitti A., 2006, 11 C EUR CHAPT ASS C