Analyzing and identifying multiword expressions in spoken language

被引:3
|
作者
Strik, Helmer [1 ]
Hulsbosch, Micha [1 ]
Cucchiarini, Catia [1 ]
机构
[1] Radboud Univ Nijmegen, Dept Linguist, Sect Language & Speech, NL-6500 HD Nijmegen, Netherlands
关键词
Multiword expressions; Spoken language; Transcription; Pronunciation reduction; Identification; WORD; LEARNERS; FLUENCY;
D O I
10.1007/s10579-009-9095-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring "islands of pronunciation reduction" that contain (potential) MWEs can be identified in a large speech corpus.
引用
收藏
页码:41 / 58
页数:18
相关论文
共 50 条
  • [1] Analyzing and identifying multiword expressions in spoken language
    Helmer Strik
    Micha Hulsbosch
    Catia Cucchiarini
    Language Resources and Evaluation, 2010, 44 : 41 - 58
  • [2] Multiword Expressions in Child Language
    Wilkens, Rodrigo
    Idiart, Marco
    Villavicencio, Aline
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2307 - 2311
  • [3] Identifying Bengali Multiword Expressions using semantic clustering
    Chakraborty, Tanmoy
    Das, Dipankar
    Bandyopadhyay, Sivaji
    LINGUISTICAE INVESTIGATIONES, 2014, 37 (01): : 106 - 128
  • [4] Multiword Expressions Resources for Italian: Presenting a Manually Annotated Spoken Corpus
    Manfredi, Ilaria
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 130 - 138
  • [5] Multiword Expressions (MWE) for Mizo Language: Literature Survey
    Majumder, Goutam
    Pakray, Partha
    Khiangte, Zoramdinthara
    Gelbukh, Alexander
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT I, 2018, 9623 : 623 - 635
  • [6] Multiword Expressions and Lexicalism
    Findlay, Jamie Y.
    PROCEEDINGS OF LFG'17 CONFERENCE, 2017, : 209 - 229
  • [7] Discovering multiword expressions
    Villavicencio, Aline
    Idiart, Marco
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (06) : 715 - 733
  • [8] Prepositional multiword expressions
    Ivankovic, Ivana Matas
    RASPRAVE, 2016, 42 (02): : 543 - 562
  • [9] Cross-Language Influences in the Processing of Multiword Expressions: From a First Language to Second and Back
    Du, Lingli
    Elgort, Irina
    Siyanova-Chanturia, Anna
    FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [10] Identification of Multiword Expressions in the brWaC
    Scheller Boos, Rodrigo Augusto
    Prestes, Kassius Vargas
    Villavicencio, Aline
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 728 - 735