A classifier system for author recognition using synonym-based features

被引:0
作者
Clark, Jonathan H. [1 ]
Hannon, Charles J. [1 ]
机构
[1] Texas Christian Univ, Dept Comp Sci, Ft Worth, TX 76129 USA
来源
MICAI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2007年 / 4827卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The writing style of an author is a phenomenon that computer scientists and stylometrists have modeled in the past with some success. However, due to the complexity and variability of writing styles, simple models often break down when faced with real world data. Thus, current trends in stylometry often employ hundreds of features in building classifier systems. In this paper, we present a novel set of synonym-based features for author recognition. We outline a basic model of how synonyms relate to an author's identify and then build an additional two models refined to meet real world needs. Experiments show strong correlation between the presented metric and the writing style of four authors with the second of the three models outperforming the others. As modern stylometric classifier systems demand increasingly larger feature sets, this new set of synonym-based features will serve to fill this ever-increasing need.
引用
收藏
页码:839 / +
页数:3
相关论文
共 15 条
  • [11] The state of authorship attribution studies: Some problems and solutions
    Rudman, J
    [J]. COMPUTERS AND THE HUMANITIES, 1997, 31 (04): : 351 - 365
  • [12] Stylistic constancy and change across literary corpora: Using measures of lexical richness to date works
    Smith, JA
    Kelly, C
    [J]. COMPUTERS AND THE HUMANITIES, 2002, 36 (04): : 411 - 430
  • [13] Automatic text categorization in terms of genre and author
    Stamatatos, E
    Kokkinakis, G
    Fakotakis, N
    [J]. COMPUTATIONAL LINGUISTICS, 2000, 26 (04) : 471 - 495
  • [14] STAMATATOS E, 2001, COMPUTER HUMANITIES, V35
  • [15] Toutanova K., 2003, FEATURE RICH PART SP, P252