Cifu: a frequency lexicon of Hong Kong Cantonese

被引:0
作者
Lai, Regine [1 ]
Winterstein, Gregoire
机构
[1] Chinese Univ Hong Kong, Dept Linguist & Modern Languages, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020) | 2020年
关键词
Hong Kong Cantonese; lexicon; lexical frequency; lexical neighborhood; neighborhood density; PHONOLOGICAL-NEIGHBORHOOD; SIMILARITY NEIGHBORHOODS; WORD-FREQUENCY; PHONOTACTICS; COMPETITION; ACTIVATION; NETWORK; SPEECH;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces Cifu, a lexical database for Hong Kong Cantonese (HKC) that offers phonological and orthographic information, frequency measures, and lexical neighborhood information for lexical items in HKC. The resource can be used for NLP applications and the design and analysis of psycholinguistic experiments on HKC. We elaborate on the characteristics and challenges specific to HKC that were relevant in the design of Cifu. This includes lexical, orthographic and phonological aspects of HKC, word segmentation issues, the place of HKC in written media, and the availability of data. We discuss the measure of Neighborhood Density (ND), highlighting how the analytic nature of Cantonese and its writing system affect that measure. We justify using six different variations of ND, based on the possibility of inserting or deleting phonemes when searching for neighbors and on the choice of data for retrieving frequencies. Statistics about the four genres (written, adult spoken, children spoken and child-directed) within the dataset are discussed. We find that the lexical diversity of the child-directed speech genre is particularly low, compared to a size-matched written corpus. The correlations of word frequencies of different genres are all high, but in general decrease as word length increases.
引用
收藏
页码:3069 / 3077
页数:9
相关论文
共 55 条
[1]  
[Anonymous], 1972, PHONOLOGY CANTONESE
[2]  
[Anonymous], 1995, Zeru horiek
[3]  
[Anonymous], PHONETICS PHONOLOGY
[4]  
Baayen R.H., 1995, CELEX2 LINGUISTIC DA
[5]  
Bigi B., 2015, PHONETICIAN, V2015-1-11, P54
[6]  
Bigi Brigitte, 2015, SPPAS
[7]   Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English [J].
Brysbaert, Marc ;
New, Boris .
BEHAVIOR RESEARCH METHODS, 2009, 41 (04) :977-990
[8]   How children explore the phonological network in child-directed speech: A survival analysis of children's first word productions [J].
Carlson, Matthew T. ;
Sonderegger, Morgan ;
Bane, Max .
JOURNAL OF MEMORY AND LANGUAGE, 2014, 75 :159-180
[9]   SIMILARITY NEIGHBORHOODS OF SPOKEN 2-SYLLABLE WORDS - RETROACTIVE EFFECTS ON MULTIPLE ACTIVATION [J].
CLUFF, MS ;
LUCE, PA .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1990, 16 (03) :551-563
[10]   Examination of the neighbor-hood activation theory in normal and hearing-impaired listeners [J].
Dirks, DD ;
Takayanagi, S ;
Moshfegh, A ;
Noffsinger, PD ;
Fausti, SA .
EAR AND HEARING, 2001, 22 (01) :1-13