Analyzing and identifying multiword expressions in spoken language

被引:3
|
作者
Strik, Helmer [1 ]
Hulsbosch, Micha [1 ]
Cucchiarini, Catia [1 ]
机构
[1] Radboud Univ Nijmegen, Dept Linguist, Sect Language & Speech, NL-6500 HD Nijmegen, Netherlands
关键词
Multiword expressions; Spoken language; Transcription; Pronunciation reduction; Identification; WORD; LEARNERS; FLUENCY;
D O I
10.1007/s10579-009-9095-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring "islands of pronunciation reduction" that contain (potential) MWEs can be identified in a large speech corpus.
引用
收藏
页码:41 / 58
页数:18
相关论文
共 50 条
  • [31] A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese
    Ramisch, Carlos
    Ramisch, Renata
    Zilio, Leonardo
    Villavicencio, Aline
    Cordeiro, Silvio
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 24 - 34
  • [32] Extracting multiword expressions from texts with the aid of online resources A classroom experiment
    Bui, Thuy
    Boers, Frank
    Coxhead, Averil
    ITL-INTERNATIONAL JOURNAL OF APPLIED LINGUISTICS, 2020, 171 (02) : 221 - 252
  • [33] Identification of Multiword Expressions in Tweets for Hate Speech Detection
    Zampieri, Nicolas
    Ramisch, Carlos
    Illina, Irina
    Fohr, Dominique
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 202 - 210
  • [34] Comprehensive Annotation of Multiword Expressions in a Social Web Corpus
    Schneider, Nathan
    Onuffer, Spencer
    Kazour, Nora
    Danchik, Emily
    Mordowanec, Michael T.
    Conrad, Henrietta
    Smith, Noah A.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 455 - 461
  • [35] Towards Comprehensive Computational Representations of Arabic Multiword Expressions
    Alghamdi, Ayman
    Atwell, Eric
    COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2017, 2017, 10596 : 415 - 431
  • [36] Using Semantic Clustering for Detecting Bengali Multiword Expressions
    Chakraborty, Tanmoy
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (02): : 103 - 113
  • [37] Identification of Nominal Multiword Expressions in Bengali Using CRF
    Chakraborty, Tanmoy
    4TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2012), 2012,
  • [38] Helping learners develop autonomy in acquiring multiword expressions
    Boers, Frank
    Bui, Thuy
    Deconinck, Julie
    Stengers, Helene
    Coxhead, Averil
    MODERN LANGUAGE JOURNAL, 2023, 107 (01) : 222 - 241
  • [39] Multiword expressions processing in Galician using Deep Learning
    Darriba, Victor
    Doval, Yerai
    Kuriyozov, Elmurod
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2021, (67): : 45 - 57
  • [40] Identification and translation of verb+noun Multiword Expressions: A Spanish-Basque study
    Inurrieta, Uxoa
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (64): : 123 - 126