<bold>ARTICULATORY FEATURE-BASED METHODS FOR ACOUSTIC AND AUDIO-VISUAL SPEECH RECOGNITION: SUMMARY FROM THE 2006 JHU SUMMERWORKSHOP</bold>

被引:0
作者
Livescu, Karen [1 ]
Cetin, Oezguer [2 ]
Hasegawa-Johnson, Mark [3 ]
King, Simon [4 ]
Bartels, Chris [5 ]
Borges, Nash [6 ]
Kantor, Arthur [3 ]
Lal, Partha [4 ]
Yung, Lisa [6 ]
Bezman, Ari [7 ]
Dawson-Haggerty, Stephen [8 ]
Woods, Bronwyn [9 ]
Frankel, Joe [2 ,4 ]
Magimai-Doss, Mathew [2 ]
Saenko, Kate [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] ICSI, Hyderabad, Andhra Pradesh, India
[3] Univ Illinois, Champaign, IL USA
[4] Univ Edinburgh, Edinburgh EH8 9YL, Midlothian, Scotland
[5] Univ Washington, Seattle, WA 98195 USA
[6] Johns Hopkins Univ, Baltimore, MD 21218 USA
[7] Dartmouth Coll, Hanover, NH 03755 USA
[8] Harvard Univ, Cambridge, MA 02138 USA
[9] Swarthmore Coll, Swarthmore, PA 19081 USA
来源
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年
基金
美国国家科学基金会; 瑞士国家科学基金会;
关键词
speech recognition; speech processing;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classifiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the "tandem" approach. In the area of pronunciation modeling, we investigate a model having multiple streams of AF states with soft synchrony constraints, for both audio-only and audio-visual recognition. The models are implemented as dynamic Bayesian networks, and tested on tasks from the Small-Vocabulary Switchboard (SVitchboard) corpus and the CUAVE audio-visual digits corpus. Finally, we analyze AF classification and forced alignment using a newly collected set of feature-level manual transcriptions.
引用
收藏
页码:621 / +
页数:2
相关论文
共 24 条
  • [1] BILMES J, 2002, ICASSP
  • [2] ARTICULATORY PHONOLOGY - AN OVERVIEW
    BROWMAN, CP
    GOLDSTEIN, L
    [J]. PHONETICA, 1992, 49 (3-4) : 155 - 180
  • [3] CETIN O, 2007, ICASSP
  • [4] Production models as a structural basis for automatic speech recognition
    Deng, L
    Ramsay, G
    Sun, D
    [J]. SPEECH COMMUNICATION, 1997, 22 (2-3) : 93 - 111
  • [5] GANAPATHIRAJU N, 1998, ICSLP
  • [6] GOWDY J, 2004, ICASSP
  • [7] Hermansky H, 2000, INT CONF ACOUST SPEE, P1635, DOI 10.1109/ICASSP.2000.862024
  • [8] KING S, 2005, INTERSPEECH
  • [9] KING S, 2007, IN PRESS JASA
  • [10] Combining acoustic and articulatory feature information for robust speech recognition
    Kirchhoff, K
    Fink, GA
    Sagerer, G
    [J]. SPEECH COMMUNICATION, 2002, 37 (3-4) : 303 - 319