Authorship Attribution Using Entropy

被引:16
作者
Grabchak, M. [1 ]
Zhang, Z. [1 ]
Zhang, D. T. [2 ]
机构
[1] Univ N Carolina, Charlotte, NC 28223 USA
[2] North Carolina Sch Math & Sci, Durham, NC USA
关键词
SHAKESPEARE; ESTIMATOR; POEM;
D O I
10.1080/09296174.2013.830551
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We propose a new methodology for testing the authorship of a relatively small work compared with the large body of an author's cannon. Our approach is based on comparing the entropy of the two samples. The difficulty lies in the fact that known estimators of entropy tend to have a large bias even when the sample size is fairly large. To deal with this, we suggest splitting the larger sample into several parts of length equal to the length of the smaller work. We then propose using these new sub-samples in a simple non-parametric test. We apply our methodology to test whether the poem Shall I Die? which is sometimes attributed to William Shakespeare was, in fact, written by him.
引用
收藏
页码:301 / 313
页数:13
相关论文
共 14 条
[1]  
[Anonymous], 2006, Elements of Information Theory
[3]  
Greieve J. W., 2007, LIT LINGUISTIC COMPU, V22, P251
[4]   INFERENCE IN AN AUTHORSHIP PROBLEM - A COMPARATIVE-STUDY OF DISCRIMINATION METHODS APPLIED TO AUTHORSHIP OF DISPUTED FEDERALIST PAPERS [J].
MOSTELLER, F ;
WALLACE, DL .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1963, 58 (302) :275-&
[5]  
Shannon C. E., 1964, The mathematical theory of communication
[6]  
TAYLOR G, 1985, NEW YORK TIMES BK R, P11
[7]   DID SHAKESPEARE WRITE A NEWLY-DISCOVERED POEM [J].
THISTED, R ;
EFRON, B .
BIOMETRIKA, 1987, 74 (03) :445-455
[8]   DIVERSITY INDEX AND ENTROPY AS MEASURES OF LEXICAL RICHNESS [J].
THOIRON, P .
COMPUTERS AND THE HUMANITIES, 1986, 20 (03) :197-202
[9]   Shakespearean Sonnets versus Shakespearean Canon [J].
Zhang, Katherine T. ;
Zhang, Zhiyi .
JOURNAL OF QUANTITATIVE LINGUISTICS, 2010, 17 (02) :81-93
[10]   Asymptotic Normality of an Entropy Estimator With Exponentially Decaying Bias [J].
Zhang, Zhiyi .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (01) :504-508