Authorship verification of e-mail and tweet messages applied for continuous authentication

被引:40
作者
Brocardo, Marcelo Luiz [1 ]
Traore, Issa [1 ]
Woungang, Isaac [2 ]
机构
[1] Univ Victoria, Dept Elect & Comp Engn, Victoria, BC V8W 3P6, Canada
[2] Ryerson Univ, Dept Comp Sci, Toronto, ON M5B 2K3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Continuous authentication; Stylometry; Short message verification; n-Gram features; Unbalanced dataset; SVM classifier; IDENTIFICATION; ATTRIBUTION;
D O I
10.1016/j.jcss.2014.12.019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Authorship verification using stylometry consists of identifying a user based on his writing style. In this paper, authorship verification is applied for continuous authentication using unstructured online text-based entry. An online document is decomposed into consecutive blocks of short texts over which (continuous) authentication decisions happen, discriminating between legitimate and impostor behaviors. We investigate blocks of texts with 140, 280 and 500 characters. The feature set includes traditional features such as lexical, syntactic, application specific features, and new features extracted from n-gram analysis. Furthermore, the proposed approach includes a strategy to circumvent issues related to unbalanced dataset, and uses Information Gain and Mutual Information as a feature selection strategy and Support Vector Machine (SVM) for classification. Experimental evaluation of the proposed approach based on the Enron email and Twitter corpuses yields very promising results consisting of an Equal Error Rate (EER) varying from 9.98% to 21.45%, for different block sizes. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:1429 / 1440
页数:12
相关论文
共 53 条
  • [1] Applying authorship analysis to extremist-group web forum messages
    Abbasi, A
    Chen, HC
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (05) : 67 - 75
  • [2] Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace
    Abbasi, Ahmed
    Chen, Hsinchun
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (02)
  • [3] Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods
    Alzahrani, Salha M.
    Salim, Naomie
    Abraham, Ajith
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02): : 133 - 149
  • [4] [Anonymous], 2011, P 15 C COMP NAT LANG
  • [5] [Anonymous], 2005, International Journal of Digital Evidence
  • [6] [Anonymous], 2007, ASC 07
  • [7] Stylistic text classification using functional lexical features
    Argamon, Shlomo
    Whitelaw, Casey
    Chase, Paul
    Hota, Sobhan Raj
    Garg, Navendu
    Levitan, Shlomo
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2007, 58 (06): : 802 - 822
  • [8] Argamon Shlomo., 2003, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, P475, DOI [10.1145/956750.956805, DOI 10.1145/956750.956805]
  • [9] Baayen H., 1996, Literary & Linguistic Computing, V11, P121, DOI 10.1093/llc/11.3.121
  • [10] On musical stylometry - a pattern recognition approach
    Backer, E
    van Kranenburg, P
    [J]. PATTERN RECOGNITION LETTERS, 2005, 26 (03) : 299 - 309