The Hebrew CHILDES corpus: transcription and morphological analysis

被引:0
|
作者
Aviad Albert
Brian MacWhinney
Bracha Nir
Shuly Wintner
机构
[1] Tel Aviv University,Department of Linguistics
[2] Carnegie Mellon University,Department of Psychology
[3] University of Haifa,Department of Communication Sciences and Disorders
[4] University of Haifa,Department of Computer Science
来源
Language Resources and Evaluation | 2013年 / 47卷
关键词
CHILDES; Hebrew; Transcription of spoken language; Morphological analysis; Morphological disambiguation;
D O I
暂无
中图分类号
学科分类号
摘要
We present a corpus of transcribed spoken Hebrew that reflects spoken interactions between children and adults. The corpus is an integral part of the CHILDES database, which distributes similar corpora for over 25 languages. We introduce a dedicated transcription scheme for the spoken Hebrew data that is sensitive to both the phonology and the standard orthography of the language. We also introduce a morphological analyzer that was specifically developed for this corpus. The analyzer adequately covers the entire corpus, producing detailed correct analyses for all tokens. Evaluation on a new corpus reveals high coverage as well. Finally, we describe a morphological disambiguation module that selects the correct analysis of each token in context. The result is a high-quality morphologically-annotated CHILDES corpus of Hebrew, along with a set of tools that can be applied to new corpora.
引用
收藏
页码:973 / 1005
页数:32
相关论文
共 50 条
  • [1] The Hebrew CHILDES corpus: transcription and morphological analysis
    Albert, Aviad
    MacWhinney, Brian
    Nir, Bracha
    Wintner, Shuly
    LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (04) : 973 - 1005
  • [2] Corpus Study of Early Bulgarian Onomatopoeias in the Terms of CHILDES
    Popova, Velka
    Popov, Dimitar
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 370 - 380
  • [3] The Hebrew Essay Corpus
    Gafni, Chen
    Prior, Anat
    Wintner, Shuly
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5580 - 5586
  • [4] Morphological analysis of the corpus of spontaneous Japanese
    Uchimoto, K
    Takaoka, K
    Nobata, C
    Yamada, A
    Sekine, S
    Isahara, H
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 382 - 390
  • [5] Offensive Hebrew Corpus and Detection using BERT
    Hamad, Nagham
    Jarrar, Mustafa
    Khalilia, Mohammad
    Nashif, Nadim
    2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
  • [6] A corpus of English learners with Arabic and Hebrew backgrounds
    Abboud, Omaima
    Laufer, Batia
    Ordan, Noam
    Sentsova, Uliana
    Wintner, Shuly
    LANGUAGE RESOURCES AND EVALUATION, 2025, 59 (01) : 591 - 599
  • [7] Morphological Decomposition in Reading Hebrew Homographs
    Paul Miller
    Batel Liran-Hazan
    Vered Vaknin
    Journal of Psycholinguistic Research, 2016, 45 : 717 - 738
  • [8] Morphological Decomposition in Reading Hebrew Homographs
    Miller, Paul
    Liran-Hazan, Batel
    Vaknin, Vered
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2016, 45 (03) : 717 - 738
  • [9] Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization
    Chennoufi, Amine
    Mazroui, Azzeddine
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (02) : 269 - 280
  • [10] Learning to spell in Hebrew: Phonological and morphological factors
    Dorit Ravid
    Reading and Writing, 2001, 14 : 459 - 485