A Complex Network Approach to Stylometry

被引:58
作者
Amancio, Diego Raphael [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
HUMAN LANGUAGE; ZIPFS; CLASSIFICATION; METRICS;
D O I
10.1371/journal.pone.0136076
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Statistical methods have been widely employed to study the fundamental properties of language. In recent years, methods from complex and dynamical systems proved useful to create several language models. Despite the large amount of studies devoted to represent texts with physical models, only a limited number of studies have shown how the properties of the underlying physical systems can be employed to improve the performance of natural language processing tasks. In this paper, I address this problem by devising complex networks methods that are able to improve the performance of current statistical methods. Using a fuzzy classification strategy, I show that the topological properties extracted from texts complement the traditional textual description. In several cases, the performance obtained with hybrid approaches outperformed the results obtained when only traditional or networked methods were used. Because the proposed model is generic, the framework devised here could be straightforwardly used to study similar textual applications where the topology plays a pivotal role in the description of the interacting agents.
引用
收藏
页数:21
相关论文
共 67 条
[1]   Intermittency and scale-free networks: a dynamical model for human language complexity [J].
Allegrini, P ;
Grigolini, P ;
Palatella, L .
CHAOS SOLITONS & FRACTALS, 2004, 20 (01) :95-105
[2]   Using metrics from complex networks to evaluate machine translation [J].
Amancio, D. R. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Pardo, T. A. S. ;
Antiqueira, L. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (01) :131-142
[3]   Complex networks analysis of manual and machine translations [J].
Amancio, Diego R. ;
Antiqueira, Lucas ;
Pardo, Thiago A. S. ;
Costa, Luciano da F. ;
Oliveira, Osvaldo N., Jr. ;
Nunes, Maria G. V. .
INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2008, 19 (04) :583-598
[4]   Probing the Topological Properties of Complex Networks Modeling Short Written Texts [J].
Amancio, Diego R. .
PLOS ONE, 2015, 10 (02)
[5]   Authorship recognition via fluctuation analysis of network topology and word intermittency [J].
Amancio, Diego R. .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2015,
[6]   Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript [J].
Amancio, Diego R. ;
Altmann, Eduardo G. ;
Rybski, Diego ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
PLOS ONE, 2013, 8 (07)
[7]   Complex networks analysis of language complexity [J].
Amancio, Diego R. ;
Aluisio, Sandra M. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
EPL, 2012, 100 (05)
[8]   Unveiling the relationship between complex networks metrics and word senses [J].
Amancio, Diego R. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
EPL, 2012, 98 (01)
[9]   Extractive summarization using complex networks and syntactic dependency [J].
Amancio, Diego R. ;
Nunes, Maria G. V. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2012, 391 (04) :1855-1864
[10]   A Systematic Comparison of Supervised Classifiers [J].
Amancio, Diego Raphael ;
Comin, Cesar Henrique ;
Casanova, Dalcimar ;
Travieso, Gonzalo ;
Bruno, Odemir Martinez ;
Rodrigues, Francisco Aparecido ;
Costa, Luciano da Fontoura .
PLOS ONE, 2014, 9 (04)