SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles

被引:617
作者
Cai, Qing [1 ]
Brysbaert, Marc [1 ]
机构
[1] Univ Ghent, Dept Expt Psychol, B-9000 Ghent, Belgium
来源
PLOS ONE | 2010年 / 5卷 / 06期
关键词
ENGLISH; LEXICON; NORMS;
D O I
10.1371/journal.pone.0010729
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency available to researchers, and the quality is less than what researchers in other languages are used to. Methodology: Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million character, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. Conclusions: Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.
引用
收藏
页数:8
相关论文
共 28 条
[1]   Contextual diversity, not word frequency, determines word-naming and lexical decision times [J].
Adelman, James S. ;
Brown, Gordon D. A. ;
Quesada, Jose F. .
PSYCHOLOGICAL SCIENCE, 2006, 17 (09) :814-823
[2]  
[Anonymous], ACM T SPEECH LANG PR, DOI DOI 10.1145/1363108.1363109
[3]  
[Anonymous], 1993, The CELEX Lexical Database (Release 1) CD-ROM
[4]  
BAI X, 2008, J EXP PSYCHOL HUM PE, V34
[5]   Visual word recognition of single-syllable words [J].
Balota, DA ;
Cortese, MJ ;
Sergent-Marshall, SD ;
Spieler, DH ;
Yap, MJ .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2004, 133 (02) :283-316
[6]   The English Lexicon Project [J].
Balota, David A. ;
Yap, Melvin J. ;
Cortese, Michael J. ;
Hutchison, Keith A. ;
Kessler, Brett ;
Loftis, Bjorn ;
Neely, James H. ;
Nelson, Douglas L. ;
Simpson, Greg B. ;
Treiman, Rebecca .
BEHAVIOR RESEARCH METHODS, 2007, 39 (03) :445-459
[7]   Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English [J].
Brysbaert, Marc ;
New, Boris .
BEHAVIOR RESEARCH METHODS, 2009, 41 (04) :977-990
[8]  
Feng Z.W, 2006, INT J CORPUS LINGUIS, V11, P173, DOI DOI 10.1075/IJCL.11.2.03FEN
[9]  
Gulikers L., 1995, CELEX LEXICAL DATABA
[10]  
Keuleers E., BEHAV RES M IN PRESS