Robustness of sentence length measures in written texts

被引:7
|
作者
Vieira, Denner S. [1 ]
Picoli, Sergio [1 ]
Mendes, Renio S. [1 ]
机构
[1] Univ Estadual Maringa, Dept Fis, Ave Colombo 5790, BR-87020900 Maringa, Parana, Brazil
关键词
Sentence length; Time series; Linear correlation; Probability distribution; Auto-correlation; LONG-RANGE CORRELATIONS; HUMAN LANGUAGE; TRANSLATION; ENGLISH;
D O I
10.1016/j.physa.2018.04.104
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Hidden structural patterns in written texts have been subject of considerable research in the last decades. In particular, mapping a text into a time series of sentence lengths is a natural way to investigate text structure. Typically, sentence length has been quantified by using measures based on the number of words and the number of characters, but other variations are possible. To quantify the robustness of different sentence length measures, we analyzed a database containing about five hundred books in English. For each book, we extracted six distinct measures of sentence length, including the number of words and number of characters (taking into account lemmatization and stop words removal). We compared these six measures for each book by using (i) Pearson's coefficient to investigate linear correlations; (ii) Kolmogorov-Smirnov test to compare distributions; and (iii) detrended fluctuation analysis (DFA) to quantify auto-correlations. We have found that all six measures exhibit very similar behavior, suggesting that sentence length is a robust measure related to text structure. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:749 / 754
页数:6
相关论文
共 50 条
  • [31] Functional sentence perspective in written and spoken communication
    Jovanovic, Jelena
    SUVREMENA LINGVISTIKA, 2007, 63 (01): : 135 - 136
  • [32] A Comparative Study of the Robustness of Frequency-Domain Connectivity Measures to Finite Data Length
    Sara Sommariva
    Alberto Sorrentino
    Michele Piana
    Vittorio Pizzella
    Laura Marzetti
    Brain Topography, 2019, 32 : 675 - 695
  • [33] A Comparative Study of the Robustness of Frequency-Domain Connectivity Measures to Finite Data Length
    Sommariva, Sara
    Sorrentino, Alberto
    Piana, Michele
    Pizzella, Vittorio
    Marzetti, Laura
    BRAIN TOPOGRAPHY, 2019, 32 (04) : 675 - 695
  • [34] Personality Disorders Identification in Written Texts
    Saloun, Petr
    Ondrejka, Adam
    Malcik, Martin
    Zelinka, Ivan
    AETA 2015: RECENT ADVANCES IN ELECTRICAL ENGINEERING AND RELATED SCIENCES, 2016, 371 : 143 - 154
  • [35] CONSTRUCTION OF WRITTEN TEXTS AT THE CUBAN SCHOOL
    Castro Brown, Yuvisleivys
    Perez Padron, Maria Caridad
    REVISTA CONRADO, 2018, 14 (62): : 7 - 11
  • [36] RECURRENT UNITS IN WRITTEN AND ORAL TEXTS
    KOCH, WA
    LINGUISTICS, 1971, 73 (SEP) : 62 - 89
  • [37] Treatment of written verb and written sentence production in an individual with aphasia: A clinical study
    Salis, Christos
    Edwards, Susan
    APHASIOLOGY, 2010, 24 (09) : 1051 - 1063
  • [38] Classical philosophical texts written by women
    Pannier, J
    PHILOSOPHISCHES JAHRBUCH, 2001, 108 (01): : 166 - 170
  • [39] Discourse segmentation of German written texts
    Luengen, Harald
    Puskas, Csilla
    Baerenfaenger, Maja
    Hilbert, Mirco
    Lobin, Henning
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 245 - 256
  • [40] COMPUTERS AND WRITTEN TEXTS - BUTLER,CS
    NATION, P
    APPLIED LINGUISTICS, 1994, 15 (02) : 232 - 233