Using the Google N-Gram corpus to measure cultural complexity

被引:32
作者
Juola, Patrick [1 ]
机构
[1] Duquesne Univ, Pittsburgh, PA 15219 USA
来源
LITERARY AND LINGUISTIC COMPUTING | 2013年 / 28卷 / 04期
基金
美国国家科学基金会;
关键词
D O I
10.1093/llc/fqt017
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Empirical studies of broad-ranging aspects of culture, such as 'cultural complexities' are often extremely difficult. Following the model of Michel et al. (Michel, J.-B., Shen, Y. K., Aiden, A. P. et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014): 176-82), and using a set of techniques originally developed to measure the complexity of language, we propose a text-based analysis of a large corpus of topic-uncontrolled text to determine how cultural complexity varies over time within a single culture. Using the Google Books American 2Gram corpus, we are able to show that (as predicted from the cumulative nature of culture), US culture has been steadily increasing in complexity, even when (for economic reasons) the amount of actual discourse as measured by publication volume decreases. We discuss several implication of this novel analysis technique as well as its implications for discussion of the meaning of 'culture.'
引用
收藏
页码:668 / 675
页数:8
相关论文
共 28 条
[1]  
[Anonymous], 1992, LINGUISTIC DIVERSITY, DOI DOI 10.7208/CHICAGO/9780226580593.001.0001
[2]  
Baker Mona., 1993, TEXT TECHNOLOGY HONO, P233, DOI [DOI 10.1075/Z.64.15BAK, 10.1075/z.64.15bak]
[3]  
Berlin B., 1969, BASIC COLOR TERMS TH
[4]  
Brown P. F., 1992, Computational Linguistics, V18, P31
[5]  
Chaitin G. J., 1996, COMPLEXITY, V1, P55
[6]   Cultural complexity revisited [J].
Denton, T .
CROSS-CULTURAL RESEARCH, 2004, 38 (01) :3-26
[7]   The time course of language change [J].
Juola, P .
COMPUTERS AND THE HUMANITIES, 2003, 37 (01) :77-96
[8]  
Juola P., 2012, P CHIC C DIG HUM COM
[9]  
Juola P., 1998, P 20 ANN C COGN SCI
[10]  
Juola P., 1997, P 2 EUR C COGN SCI, P207