Lognormality of the Distribution of Japanese Sentence Lengths

被引:5
作者
Furuhashi, Sho [1 ]
Hayakawa, Yoshinori [2 ]
机构
[1] Tohoku Univ, Dept Phys, Sendai, Miyagi 9808578, Japan
[2] Tohoku Univ, Ctr Informat Technol Educ, Sendai, Miyagi 9808576, Japan
关键词
sentence length; multiplicative process; Japanese dependency structure; lognormal distribution; dependency tree; LANGUAGE;
D O I
10.1143/JPSJ.81.034004
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The lengths of sentences in written texts have been reported to exhibit characteristic distributions that resemble lognormal distributions. However, the mechanism responsible for such lognormality is unclear. In this quantitative study, we analyze over 10,000 Japanese sentences from out-of-copyright Japanese texts stored on Aozora Bunko. We first confirm that sentence length distributions can be better represented by the lognormal function than by other functions (e. g., the gamma distribution). Next, under the assumption that each sentence is generated by a hierarchical branching process in terms of dependency trees, we test whether the composition of sentences can be explained by a simple multiplicative process by utilizing the Japanese dependency analyzer CaboCha. The results imply that the lognormality of sentence length distributions originates from the dependency tree depth and that a simple multiplicative model cannot accurately model the processes involved in generating sentences.
引用
收藏
页数:5
相关论文
共 21 条
[1]  
Arai H., 2001, HITOTSUBASHI U J, V125, P205
[2]   Patterns in syntactic dependency networks -: art. no. 051915 [J].
Cancho, RFI ;
Solé, RV ;
Köhler, R .
PHYSICAL REVIEW E, 2004, 69 (05) :8
[3]   The small world of human language [J].
Cancho, RFI ;
Solé, RV .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2001, 268 (1482) :2261-2265
[4]   LONG-RANGE CORRELATIONS BETWEEN LETTERS AND SENTENCES IN TEXTS [J].
EBELING, W ;
NEIMAN, A .
PHYSICA A, 1995, 215 (03) :233-241
[5]  
Ishida M, 2007, GLOTTOMETRICS, V15, P28
[6]   ON THE LOGNORMALITY OF RAIN RATE [J].
KEDEM, B ;
CHIU, LS .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (04) :901-905
[7]  
Kudo T., 2002, P 6 C NAT LANG LEARN, V20, P63
[8]   Statistical Properties of Height of Japanese Schoolchildren [J].
Kuninaka, Hiroto ;
Mitsuhashi, Yu ;
Matsushita, Mitsugu .
JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2009, 78 (12)
[9]  
Limpert E, 2001, BIOSCIENCE, V51, P341, DOI 10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO
[10]  
2