The use of film subtitles to estimate word frequencies

被引:214
作者
New, Boris
Brysbaert, Marc
Veronis, Jean
Pallier, Christophe
机构
[1] Royal Holloway Univ London, London, England
[2] Univ Aix Marseille 1, F-13331 Marseille 3, France
[3] INSERM, CNRS, F-75654 Paris 13, France
[4] Serv Hosp Frederic Joliot, Orsay, France
关键词
D O I
10.1017/S014271640707035X
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy way to obtain word frequency measures in language registers other than text writing. We compiled a corpus of 52 million French words, coming from a variety of films. Frequency measures based on this corpus compared well to other spoken and written frequency measures, and explained variance in lexical decision times in addition to what is accounted for by the available French written frequency measures.
引用
收藏
页码:661 / 677
页数:17
相关论文
共 13 条
  • [1] [Anonymous], 1995, CELEX LEXICAL DATABA
  • [2] Morphological influences on the recognition of monosyllabic monomorphemic words
    Baayen, R. H.
    Feldman, L. B.
    Schreuder, R.
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 2006, 55 (02) : 290 - 313
  • [3] Visual word recognition of single-syllable words
    Balota, DA
    Cortese, MJ
    Sergent-Marshall, SD
    Spieler, DH
    Yap, MJ
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL, 2004, 133 (02) : 283 - 316
  • [4] BALOTA DA, IN PRESS BEHAV RES M
  • [5] Using Internet search engines to estimate word frequency
    Blair, IV
    Urland, GR
    Ma, JE
    [J]. BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2002, 34 (02): : 286 - 290
  • [6] Bonin P, 2001, CAH PSYCHOL COGN, V20, P401
  • [7] Relative clause attachment in Dutch: On-line comprehension corresponds to corpus frequencies when lexical variables are taken into account
    Desmet, Timothy
    De Baecke, Constantijn
    Drieghe, Denis
    Brysbaert, Marc
    Vonk, Wietske
    [J]. LANGUAGE AND COGNITIVE PROCESSES, 2006, 21 (04): : 453 - 485
  • [8] Deygers K., 2000, NEDERLANDSE TAALKUND, V5, P356
  • [9] Equipe DELIC, 2004, RECHERCHES FRANCAIS, V18, P11
  • [10] New B, 2001, ANN PSYCHOL, V101, P447