Prosodic and temporal features for language modeling for dialog

被引:10
|
作者
Ward, Nigel G. [1 ]
Vega, Alejandro [1 ]
Baumann, Timo [2 ]
机构
[1] Univ Texas El Paso, El Paso, TX 79968 USA
[2] Univ Potsdam, Dept Linguist, D-14476 Potsdam, Germany
基金
美国国家科学基金会;
关键词
Dialog dynamics; Dialog state; Prosody; Interlocutor behavior; Word probabilities; Prediction; Perplexity; Speech recognition; Switchboard corpus; Verbmobil corpus; SPEECH RECOGNITION;
D O I
10.1016/j.specom.2011.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [1] Prosodic Features for a Maximum Entropy Language Model
    Chan, Oscar
    Togneri, Roberto
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1858 - 1861
  • [2] Prosodic features for language identification
    Mary, Leena
    Yegnanarayana, B.
    ICSCN 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING COMMUNICATIONS AND NETWORKING, 2008, : 57 - +
  • [3] PROSODIC FEATURES AND FORMANT MODELING FOR AN IVECTOR-BASED LANGUAGE RECOGNITION SYSTEM
    Martinez, David
    Lleida, Eduardo
    Ortega, Alfonso
    Miguel, Antonio
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6847 - 6851
  • [4] Spoken Language Recognition With Prosodic Features
    Ng, Raymond W. M.
    Lee, Tan
    Leung, Cheung-Chi
    Ma, Bin
    Li, Haizhou
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (09): : 1841 - 1853
  • [5] Analysis and Selection of Prosodic Features for Language Identification
    Ng, Raymond W. M.
    Lee, Tan
    Leung, Cheung-Chi
    Ma, Bin
    Li, Haizhou
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 123 - 128
  • [6] Towards Empirical Dialog-State Modeling and its Use in Language Modeling
    Ward, Nigel G.
    Vega, Alejandro
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2311 - 2314
  • [7] Extraction and representation of prosodic features for language and speaker recognition
    Mary, Leena
    Yegnanarayana, B.
    SPEECH COMMUNICATION, 2008, 50 (10) : 782 - 796
  • [8] A COMPARISON OF APPROACHES FOR MODELING PROSODIC FEATURES IN SPEAKER RECOGNITION
    Ferrer, Luciana
    Scheffer, Nicolas
    Shriberg, Elizabeth
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4414 - 4417
  • [9] CONTOUR MODELING OF PROSODIC AND ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
    Kockmann, Marcel
    Burget, Lukas
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 45 - 48
  • [10] Automatic prosodic variations modeling for language and dialect discrimination
    Rouas, Jean-Luc
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1904 - 1911