Prosodic and temporal features for language modeling for dialog

被引：10

作者：

Ward, Nigel G. ^{[1
]}

Vega, Alejandro ^{[1
]}

Baumann, Timo ^{[2
]}

机构：

[1] Univ Texas El Paso, El Paso, TX 79968 USA

[2] Univ Potsdam, Dept Linguist, D-14476 Potsdam, Germany

来源：

SPEECH COMMUNICATION | 2012年 / 54卷 / 02期

基金：

美国国家科学基金会;

关键词：

Dialog dynamics; Dialog state; Prosody; Interlocutor behavior; Word probabilities; Prediction; Perplexity; Speech recognition; Switchboard corpus; Verbmobil corpus; SPEECH RECOGNITION;

D O I：

10.1016/j.specom.2011.07.009

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

If we can model the cognitive and communicative processes underlying speech, we should be able to better predict what a speaker will do. With this idea as inspiration, we examine a number of prosodic and timing features as potential sources of information on what words the speaker is likely to say next. In spontaneous dialog we find that word probabilities do vary with such features. Using perplexity as the metric, the most informative of these included recent speaking rate, volume, and pitch, and time until end of utterance. Using simple combinations of such features to augment trigram language models gave up to a 8.4% perplexity benefit on the Switchboard corpus, and up to a 1.0% relative reduction in word error rate (0.3% absolute) on the Verbmobil II corpus. (C) 2011 Elsevier B.V. All rights reserved.

引用

页码：161 / 174

页数：14

共 50 条

[1] Prosodic Features for a Maximum Entropy Language Model
Chan, Oscar
Togneri, Roberto
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1858 - 1861
[2] Prosodic features for language identification
Mary, Leena
Yegnanarayana, B.
ICSCN 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING COMMUNICATIONS AND NETWORKING, 2008, : 57 - +
[3] PROSODIC FEATURES AND FORMANT MODELING FOR AN IVECTOR-BASED LANGUAGE RECOGNITION SYSTEM
Martinez, David
Lleida, Eduardo
Ortega, Alfonso
Miguel, Antonio
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6847 - 6851
[4] Spoken Language Recognition With Prosodic Features
Ng, Raymond W. M.
Lee, Tan
Leung, Cheung-Chi
Ma, Bin
Li, Haizhou
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (09): : 1841 - 1853
[5] Analysis and Selection of Prosodic Features for Language Identification
Ng, Raymond W. M.
Lee, Tan
Leung, Cheung-Chi
Ma, Bin
Li, Haizhou
2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 123 - 128
[6] Towards Empirical Dialog-State Modeling and its Use in Language Modeling
Ward, Nigel G.
Vega, Alejandro
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2311 - 2314
[7] Extraction and representation of prosodic features for language and speaker recognition
Mary, Leena
Yegnanarayana, B.
SPEECH COMMUNICATION, 2008, 50 (10) : 782 - 796
[8] A COMPARISON OF APPROACHES FOR MODELING PROSODIC FEATURES IN SPEAKER RECOGNITION
Ferrer, Luciana
Scheffer, Nicolas
Shriberg, Elizabeth
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4414 - 4417
[9] CONTOUR MODELING OF PROSODIC AND ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
Kockmann, Marcel
Burget, Lukas
2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 45 - 48
[10] Automatic prosodic variations modeling for language and dialect discrimination
Rouas, Jean-Luc
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1904 - 1911

← 1 2 3 4 5 →