Using the Google N-Gram corpus to measure cultural complexity

被引：32

作者：

Juola, Patrick ^{[1
]}

机构：

[1] Duquesne Univ, Pittsburgh, PA 15219 USA

来源：

LITERARY AND LINGUISTIC COMPUTING | 2013年 / 28卷 / 04期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1093/llc/fqt017

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Empirical studies of broad-ranging aspects of culture, such as 'cultural complexities' are often extremely difficult. Following the model of Michel et al. (Michel, J.-B., Shen, Y. K., Aiden, A. P. et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176-82), and using a set of techniques originally developed to measure the complexity of language, we propose a text-based analysis of a large corpus of topic-uncontrolled text to determine how cultural complexity varies over time within a single culture. Using the Google Books American 2Gram corpus, we are able to show that (as predicted from the cumulative nature of culture), US culture has been steadily increasing in complexity, even when (for economic reasons) the amount of actual discourse as measured by publication volume decreases. We discuss several implication of this novel analysis technique as well as its implications for discussion of the meaning of 'culture.'

引用

页码：668 / 675

页数：8

共 28 条

[1]

[Anonymous], 1992, LINGUISTIC DIVERSITY, DOI DOI 10.7208/CHICAGO/9780226580593.001.0001

[2]

Baker Mona., 1993, TEXT TECHNOLOGY HONO, P233, DOI [DOI 10.1075/Z.64.15BAK, 10.1075/z.64.15bak]

[3]

Berlin B., 1969, BASIC COLOR TERMS TH

[4]

Brown P. F., 1992, Computational Linguistics, V18, P31

[5]

Chaitin G. J., 1996, COMPLEXITY, V1, P55

[6] Cultural complexity revisited [J].

Denton, T .

CROSS-CULTURAL RESEARCH, 2004, 38 (01) :3-26

[7] The time course of language change [J].

Juola, P .

COMPUTERS AND THE HUMANITIES, 2003, 37 (01) :77-96

[8]

Juola P., 2012, P CHIC C DIG HUM COM

[9]

Juola P., 1998, P 20 ANN C COGN SCI

[10]

Juola P., 1997, P 2 EUR C COGN SCI, P207

← 1 2 3 →