Prosodic and temporal features for language modeling for dialog

被引:10
作者
Ward, Nigel G. [1 ]
Vega, Alejandro [1 ]
Baumann, Timo [2 ]
机构
[1] Univ Texas El Paso, El Paso, TX 79968 USA
[2] Univ Potsdam, Dept Linguist, D-14476 Potsdam, Germany
基金
美国国家科学基金会;
关键词
Dialog dynamics; Dialog state; Prosody; Interlocutor behavior; Word probabilities; Prediction; Perplexity; Speech recognition; Switchboard corpus; Verbmobil corpus; SPEECH RECOGNITION;
D O I
10.1016/j.specom.2011.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [41] Accuracy and Prosodic Features in Oral Reading in Adolescents
    Fumagalli, Julieta
    Barreyro, Pablo
    Fant, Florencia
    Jaichenco, Virginia
    LIBERABIT-REVISTA DE PSICOLOGIA, 2023, 29 (02):
  • [42] Pauses (and other prosodic features) in Simultaneous Interpreting
    Ahrens, Barbara
    FORUM-REVUE INTERNATIONALE D INTERPRETATION ET DE TRADUCTION-INTERNATIONAL JOURNAL OF INTERPRETATION AND TRANSLATION, 2007, 5 (01): : 1 - 18
  • [43] Modeling prosodic differences for speaker recognition
    Adami, Andre Gustavo
    SPEECH COMMUNICATION, 2007, 49 (04) : 277 - 291
  • [44] Effect of VoIP on Prosodic Features for Speaker Verification
    Cherian, Athira Jess
    Antony, Anil P.
    Mary, Leena
    2015 INTERNATIONAL CONFERENCE ON CONTROL COMMUNICATION & COMPUTING INDIA (ICCC), 2015, : 487 - 490
  • [45] EVALUATION OF MIMICKED SPEECH USING PROSODIC FEATURES
    Mary, Leena
    Babu, Anish K. K.
    Joseph, Aju
    George, Gibin M.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7189 - 7193
  • [46] Emotion recognition from speech using wavelet packet transform and prosodic features
    Gupta, Manish
    Bharti, Shambhu Shankar
    Agarwal, Suneeta
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (02) : 1541 - 1553
  • [47] IVECTOR-BASED PROSODIC SYSTEM FOR LANGUAGE IDENTIFICATION
    Martinez, David
    Burget, Lukas
    Ferrer, Luciana
    Scheffer, Nicolas
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4861 - 4864
  • [48] Prosodic Word Recursion in a Polysynthetic Language (Blackfoot; Algonquian)
    Weber, Natalie
    LANGUAGES, 2022, 7 (03)
  • [49] The use of prosodic cues in language discrimination tasks by rats
    Toro, JM
    Trobalon, JB
    Sebastián-Gallés, N
    ANIMAL COGNITION, 2003, 6 (02) : 131 - 136
  • [50] IVECTOR-BASED PROSODIC SYSTEM FOR LANGUAGE IDENTIFICATION
    Martinez, David
    Burget, Lukas
    Ferrer, Luciana
    Scheffer, Nicolas
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4861 - 4864