The study explores the expression of time in English and Czech children's fiction using n-gram extraction. This raises the methodological question of the contribution of n-gram based approaches to language comparison. We extract 2-5-grams (i.e. continuous sequences of 2-5 words) from comparable corpora of English and Czech children's fiction. The consistently higher type/token ratios in Czech point to a higher variability of Czech, characterized by morphological variability and free word-order. The qualitative part of the analysis focuses on n-grams relating to time. While n-grams proved a useful starting point in cross-linguistic analysis, highlighting typological characteristics of the languages, the study suggests that more flexible units may be needed for exploring the means of expressing time. We propose relying on patterns which are based on partly lemmatised frequent n-grams and admit some variation.
机构:
Macquarie Univ, Dept Linguist, Sydney, NSW, Australia
North West Univ, Vaal Triangle Campus, Vanderbijlpark, South AfricaMacquarie Univ, Dept Linguist, Sydney, NSW, Australia