Prosodic and temporal features for language modeling for dialog

被引:10
|
作者
Ward, Nigel G. [1 ]
Vega, Alejandro [1 ]
Baumann, Timo [2 ]
机构
[1] Univ Texas El Paso, El Paso, TX 79968 USA
[2] Univ Potsdam, Dept Linguist, D-14476 Potsdam, Germany
基金
美国国家科学基金会;
关键词
Dialog dynamics; Dialog state; Prosody; Interlocutor behavior; Word probabilities; Prediction; Perplexity; Speech recognition; Switchboard corpus; Verbmobil corpus; SPEECH RECOGNITION;
D O I
10.1016/j.specom.2011.07.009
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:161 / 174
页数:14
相关论文
共 50 条
  • [11] Dynamic Language Modeling using Bayesian Networks for Spoken Dialog Systems
    Raux, Antoine
    Mehta, Neville
    Ramachandran, Deepak
    Gupta, Rakesh
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 3030 - +
  • [12] Language Classification Using Prosodic Features: Comparing Intensity and Pitch
    Zulu, Peleira Nicholas
    2013 Pan African International Conference on Information Science, Computing and Telecommunications (PACT), 2013, : 116 - 121
  • [13] Perception of Sentence Stress in Speech Correlates With the Temporal Unpredictability of Prosodic Features
    Kakouros, Sofoklis
    Rasanen, Okko
    COGNITIVE SCIENCE, 2016, 40 (07) : 1739 - 1774
  • [14] Prosodic Modeling in Large Vocabulary Mandarin Speech Recognition
    Huang, Jui-Ting
    Lee, Lin-shan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1241 - 1244
  • [15] Romanian language and dialog systems
    Department of Economic Engineering and Manufacturing Systems, 'Transilvania' University of Brasov, Romania
    不详
    WSEAS Trans. Syst. Control, 2008, 5 (343-352):
  • [16] Towards long-range prosodic attribute modeling for language recognition
    Ng, Raymond W. M.
    Leung, Cheung-Chi
    Hautamaeki, Ville
    Lee, Tan
    Ma, Bin
    Li, Haizhou
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1792 - +
  • [17] Discriminative Language Modeling With Linguistic and Statistically Derived Features
    Arisoy, Ebru
    Saraclar, Murat
    Roark, Brian
    Shafran, Izhak
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 540 - 550
  • [18] Automated dialog systems for Romanian language
    Catalin, Chivu
    Catrina, Chivu
    MMACTEE' 08: PROCEEDINGS OF THE 10TH WSEAS INTERNATIONAL CONFERENCE MATHERMATICAL METHODS AND COMPUTATIONAL TECHNIQUES IN ELECTRICAL ENGINEERING: COMPUTATIONAL METHODS AND INTELLIGENT SYSTEMS, 2008, : 83 - 90
  • [19] User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog
    Skantze, Gabriel
    House, David
    Edlund, Jens
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2002 - 2005
  • [20] Prosodic tools for language learning
    Delmonte, Rodolfo
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2009, 12 (04) : 161 - 184