Complex networks analysis of language complexity

被引:40
作者
Amancio, Diego R. [1 ]
Aluisio, Sandra M. [2 ]
Oliveira, Osvaldo N., Jr. [1 ]
Costa, Luciano da F. [1 ]
机构
[1] Univ Sao Paulo, Inst Phys Sao Carlos, POB 369, BR-13560970 Sao Paulo, Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, BR-13560970 Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
D O I
10.1209/0295-5075/100/58002
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Methods from statistical physics, such as those involving complex networks, have been increasingly used in the quantitative analysis of linguistic phenomena. In this paper, we represented pieces of text with different levels of simplification in co-occurrence networks and found that topological regularity correlated negatively with textual complexity. Furthermore, in less complex texts the distance between concepts, represented as nodes, tended to decrease. The complex networks metrics were treated with multivariate pattern recognition techniques, which allowed us to distinguish between original texts and their simplified versions. For each original text, two simplified versions were generated manually with increasing number of simplification operations. As expected, distinction was easier for the strongly simplified versions, where the most relevant metrics were node strength, shortest paths and diversity. Also, the discrimination of complex texts was improved with higher hierarchical network metrics, thus pointing to the usefulness of considering wider contexts around the concepts. Though the accuracy rate in the distinction was not as high as in methods using deep linguistic knowledge, the complex network approach is still useful for a rapid screening of texts whenever assessing complexity is essential to guarantee accessibility to readers with limited reading ability. Copyright (c) EPLA, 2012
引用
收藏
页数:6
相关论文
共 23 条
[1]   Using metrics from complex networks to evaluate machine translation [J].
Amancio, D. R. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Pardo, T. A. S. ;
Antiqueira, L. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (01) :131-142
[2]   Complex networks analysis of manual and machine translations [J].
Amancio, Diego R. ;
Antiqueira, Lucas ;
Pardo, Thiago A. S. ;
Costa, Luciano da F. ;
Oliveira, Osvaldo N., Jr. ;
Nunes, Maria G. V. .
INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2008, 19 (04) :583-598
[3]   Extractive summarization using complex networks and syntactic dependency [J].
Amancio, Diego R. ;
Nunes, Maria G. V. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2012, 391 (04) :1855-1864
[4]   Identification of literary movements using complex networks to represent texts [J].
Amancio, Diego Raphael ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da Fontoura .
NEW JOURNAL OF PHYSICS, 2012, 14
[5]   Comparing intermittency and network measurements of words and their dependence on authorship [J].
Amancio, Diego Raphael ;
Altmann, Eduardo G. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da Fontoura .
NEW JOURNAL OF PHYSICS, 2011, 13
[6]  
[Anonymous], 2000, Pattern Classification
[7]  
[Anonymous], 2010, Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
[8]   Scale-Free Networks: A Decade and Beyond [J].
Barabasi, Albert-Laszlo .
SCIENCE, 2009, 325 (5939) :412-413
[9]   Network physiology reveals relations between network topology and physiological function [J].
Bashan, Amir ;
Bartsch, Ronny P. ;
Kantelhardt, Jan. W. ;
Havlin, Shlomo ;
Ivanov, Plamen Ch .
NATURE COMMUNICATIONS, 2012, 3
[10]   Level statistics of words: Finding keywords in literary texts and symbolic sequences [J].
Carpena, P. ;
Bernaola-Galvan, P. ;
Hackenberg, M. ;
Coronado, A. V. ;
Oliver, J. L. .
PHYSICAL REVIEW E, 2009, 79 (03)