The Hebrew CHILDES corpus: transcription and morphological analysis

被引：0

作者：

Aviad Albert

Brian MacWhinney

Bracha Nir

Shuly Wintner

机构：

[1] Tel Aviv University,Department of Linguistics

[2] Carnegie Mellon University,Department of Psychology

[3] University of Haifa,Department of Communication Sciences and Disorders

[4] University of Haifa,Department of Computer Science

来源：

Language Resources and Evaluation | 2013年 / 47卷

关键词：

CHILDES; Hebrew; Transcription of spoken language; Morphological analysis; Morphological disambiguation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We present a corpus of transcribed spoken Hebrew that reflects spoken interactions between children and adults. The corpus is an integral part of the CHILDES database, which distributes similar corpora for over 25 languages. We introduce a dedicated transcription scheme for the spoken Hebrew data that is sensitive to both the phonology and the standard orthography of the language. We also introduce a morphological analyzer that was specifically developed for this corpus. The analyzer adequately covers the entire corpus, producing detailed correct analyses for all tokens. Evaluation on a new corpus reveals high coverage as well. Finally, we describe a morphological disambiguation module that selects the correct analysis of each token in context. The result is a high-quality morphologically-annotated CHILDES corpus of Hebrew, along with a set of tools that can be applied to new corpora.

引用

页码：973 / 1005

页数：32

共 50 条

[1] The Hebrew CHILDES corpus: transcription and morphological analysis
Albert, Aviad
MacWhinney, Brian
Nir, Bracha
Wintner, Shuly
LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (04) : 973 - 1005
[2] Corpus Study of Early Bulgarian Onomatopoeias in the Terms of CHILDES
Popova, Velka
Popov, Dimitar
SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 370 - 380
[3] The Hebrew Essay Corpus
Gafni, Chen
Prior, Anat
Wintner, Shuly
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5580 - 5586
[4] Morphological analysis of the corpus of spontaneous Japanese
Uchimoto, K
Takaoka, K
Nobata, C
Yamada, A
Sekine, S
Isahara, H
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 382 - 390
[5] Offensive Hebrew Corpus and Detection using BERT
Hamad, Nagham
Jarrar, Mustafa
Khalilia, Mohammad
Nashif, Nadim
2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,
[6] A corpus of English learners with Arabic and Hebrew backgrounds
Abboud, Omaima
Laufer, Batia
Ordan, Noam
Sentsova, Uliana
Wintner, Shuly
LANGUAGE RESOURCES AND EVALUATION, 2025, 59 (01) : 591 - 599
[7] Morphological Decomposition in Reading Hebrew Homographs
Paul Miller
Batel Liran-Hazan
Vered Vaknin
Journal of Psycholinguistic Research, 2016, 45 : 717 - 738
[8] Morphological Decomposition in Reading Hebrew Homographs
Miller, Paul
Liran-Hazan, Batel
Vaknin, Vered
JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2016, 45 (03) : 717 - 738
[9] Impact of morphological analysis and a large training corpus on the performances of Arabic diacritization
Chennoufi, Amine
Mazroui, Azzeddine
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (02) : 269 - 280
[10] Learning to spell in Hebrew: Phonological and morphological factors
Dorit Ravid
Reading and Writing, 2001, 14 : 459 - 485

← 1 2 3 4 5 →