Prosodic and temporal features for language modeling for dialog

被引:10
|
作者
Ward, Nigel G. [1 ]
Vega, Alejandro [1 ]
Baumann, Timo [2 ]
机构
[1] Univ Texas El Paso, El Paso, TX 79968 USA
[2] Univ Potsdam, Dept Linguist, D-14476 Potsdam, Germany
基金
美国国家科学基金会;
关键词
Dialog dynamics; Dialog state; Prosody; Interlocutor behavior; Word probabilities; Prediction; Perplexity; Speech recognition; Switchboard corpus; Verbmobil corpus; SPEECH RECOGNITION;
D O I
10.1016/j.specom.2011.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [31] Modeling Dimensions of Prosodic Prominence
    Roessig, Simon
    Muecke, Doris
    FRONTIERS IN COMMUNICATION, 2019, 4
  • [32] Exploring Interactional Features with Prosodic Patterns
    Zellers, Margaret
    Ogden, Richard
    LANGUAGE AND SPEECH, 2014, 57 (03) : 285 - 309
  • [33] Prosodic aspects of language impairment in children
    Hargrove, PM
    TOPICS IN LANGUAGE DISORDERS, 1997, 17 (04) : 76 - 83
  • [34] Robustness of prosodic features to voice imitation
    Farrus, Mireia
    Wagner, Michael
    Anguita, Jan
    Hernando, Javier
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 613 - +
  • [35] NOVEL APPLICATIONS OF NEURAL NETWORKS IN SPEECH TECHNOLOGY SYSTEMS: SEARCH SPACE REDUCTION AND PROSODIC MODELING
    Macias-Guarasa, J.
    Montero, J. M.
    Ferreiros, J.
    Cordoba, R.
    San-Segundo, R.
    Gutierrez-Arriola, J.
    D'Haro, L. F.
    Fernandez, F.
    Barra, R.
    Pardo, J. M.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2009, 15 (04) : 631 - 646
  • [36] PROSODIC ATTRIBUTE MODEL FOR SPOKEN LANGUAGE IDENTIFICATION
    Ng, Raymond W. M.
    Leung, Cheung-Chi
    Lee, Tan
    Ma, Bin
    Li, Haizhou
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5022 - 5025
  • [37] Prosodic Manifestations of Confidence and Uncertainty in Spoken Language
    Pon-Barry, Heather
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 74 - 77
  • [38] PROSODIC MODELING IN SWEDISH SPEECH SYNTHESIS
    BRUCE, G
    GRANSTROM, B
    SPEECH COMMUNICATION, 1993, 13 (1-2) : 63 - 73
  • [39] Disentangling semantic and prosodic features of English poetry
    Shang, Wenyi
    Underwood, Ted
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2024,
  • [40] Automatic Paragraph Segmentation with Lexical and Prosodic Features
    Lai, Catherine
    Farrus, Mireia
    Moore, Johanna D.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1034 - 1038