Cross-Language Authorship Attribution

被引:0
作者
Bogdanova, Dasha [1 ]
Lazaridou, Angeliki [2 ]
机构
[1] Dublin City Univ, Sch Comp, CNGL Ctr Global Intelligent Content, Dublin 9, Ireland
[2] Univ Trent, Ctr Mind Brain Sci, I-38100 Trento, Italy
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
Cross-Language Techniques; Authorship Attribution; Text Classification;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper presents a novel task of cross-language authorship attribution (CLAA), an extension of authorship attribution task to multilingual settings: given data labelled with authors in language X, the objective is to determine the author of a document written in language Y, where X not equal Y. We propose a number of cross-language stylometric features for the task of CLAA, such as those based on sentiment and emotional markers. We also explore an approach based on machine translation (MT) with both lexical and cross-language features. We experimentally show that MT could be used as a starting point to CLAA, since it allows good attribution accuracy to be achieved. The cross-language features provide acceptable accuracy while using jointly with MT, though do not outperform lexical features.
引用
收藏
页码:2015 / 2020
页数:6
相关论文
共 26 条
  • [1] [Anonymous], J AM SOC INFORM SCI
  • [2] [Anonymous], 2013, P 2013 C N AM CHAPTE, DOI DOI 10.1109/GEOINFORMATICS.2010.5567952
  • [3] [Anonymous], P 4 INT C LANG RES E
  • [4] [Anonymous], 2003, P 41 ANN M ASS COMP
  • [5] [Anonymous], 2004, P INT C COMP LING CO
  • [6] [Anonymous], 2012, P 8 INT C LANG RES E
  • [7] Atserias Jordi, 2004, P 2 INT WORDNET C
  • [8] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [9] Dunn Rita, 1989, SURVEY RES LEARNING, V46
  • [10] Esuli Andrea., 2006, P 5 C LANG RES EV LR, P417