Analyzing and identifying multiword expressions in spoken language

被引:3
|
作者
Strik, Helmer [1 ]
Hulsbosch, Micha [1 ]
Cucchiarini, Catia [1 ]
机构
[1] Radboud Univ Nijmegen, Dept Linguist, Sect Language & Speech, NL-6500 HD Nijmegen, Netherlands
关键词
Multiword expressions; Spoken language; Transcription; Pronunciation reduction; Identification; WORD; LEARNERS; FLUENCY;
D O I
10.1007/s10579-009-9095-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring "islands of pronunciation reduction" that contain (potential) MWEs can be identified in a large speech corpus.
引用
收藏
页码:41 / 58
页数:18
相关论文
共 50 条
  • [41] Concreteness ratings for 62,000 English multiword expressions
    Emiko J. Muraki
    Summer Abdalla
    Marc Brysbaert
    Penny M. Pexman
    Behavior Research Methods, 2023, 55 : 2522 - 2531
  • [42] Searching for Illustrative Sentences for Multiword Expressions in a Research Paper Database
    Nanba, Hidetsugu
    Morishita, Satoshi
    DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 114 - +
  • [43] Eye of a Needle in a Haystack Multiword Expressions in Czech: Typology and Lexicon
    Hnatkova, Milena
    Jelinek, Tomas
    Koprivova, Marie
    Petkevic, Vladimir
    Rosen, Alexandr
    Skoumalova, Hana
    Vondricka, Pavel
    COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2017, 2017, 10596 : 160 - 175
  • [44] A Rapid Method to Extract Multiword Expressions with Statistic Measures and Linguistic Rules
    Wang, Lijuan
    Liu, Rong
    WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 234 - 241
  • [45] Determining the Importance of Frequency and Contextual Diversity in the Lexical Organization of Multiword Expressions
    Senaldi, Marco S. G.
    Titone, Debra A.
    Johns, Brendan T.
    CANADIAN JOURNAL OF EXPERIMENTAL PSYCHOLOGY-REVUE CANADIENNE DE PSYCHOLOGIE EXPERIMENTALE, 2022, 76 (02): : 87 - 98
  • [46] Knowledge-based Sense Disambiguation of Multiword Expressions in Requirements Documents
    Hey, Tobias
    Keim, Jan
    Tichy, Walter F.
    29TH IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS (REW 2021), 2021, : 70 - 76
  • [47] An Approach to Identify the Complete Reduplicated Multiword Expressions in Digital Bengali Text
    Pan S.
    Journal of The Institution of Engineers (India): Series B, 2025, 106 (2) : 521 - 537
  • [48] Determining sentiment views of verbal multiword expressions using linguistic features
    Wiegand, Michael
    Schulder, Marc
    Ruppenhofer, Josef
    NATURAL LANGUAGE ENGINEERING, 2024, 30 (02) : 256 - 293
  • [49] Correlation in spoken language
    Roig, Audrey
    LANGUE FRANCAISE, 2017, (196): : 75 - +
  • [50] Time-sensitive Arabic multiword expressions extraction from social networks
    Daoud, Daoud
    Al-Kouz, Akram
    Daoud, Mohammad
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (02) : 249 - 258