Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition

被引:0
|
作者
Sudhakar, Prasad [1 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Catholic Univ Louvain, ICTEAM ELEN, Louvain La Neuve, Belgium
[2] Indian Inst Sci IISc, Dept Elect Engn, Bangalore, Karnataka, India
关键词
phonetic recognition; acoustic-to-articulatory inversion; smoothing; Gaussian mixture model; sparsity; Chambolle-Pock; l(1) minimization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition using articulatory features estimated using Acoustic-to-Articulatory Inversion (AAI) is considered. A recently proposed sparse smoothing approach is used to postprocess the estimates from Gaussian Mixture Model (GMM) based AAI using Minimum Mean Squared Error (MMSE) criterion. It is well known that low-pass smoothing as post-processing improves the AAI performance. Sparse smoothing, on the other hand, not only improves the AAI performance but also preserves the MMSE optimality for as many estimates as possible. In this work we investigate the benefit of preserving MMSE optimality during postprocessing by using the smoothed articulatory estimates in a broad class phonetic recognition task. Experimental results show that the low-pass filter based smoothing results in a significant drop in the recognition accuracy compared to that using articulatory estimates without any smoothing. However, the recognition accuracy obtained by articulatory features from sparse smoothing is similar to that using articulatory features directly from GMM based AAI without any post processing. Thus, sparse smoothing provides benefit both in terms of the inversion performance as well as recognition accuracy, while that is not the case with low-pass smoothing.
引用
收藏
页码:169 / 173
页数:5
相关论文
共 50 条
  • [21] Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models
    Ben Youssef, Atef
    Badin, Pierre
    Bailly, Gerard
    Heracleous, Panikos
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2235 - 2238
  • [22] Analysis of acoustic-to-articulatory speech inversion across different accents and languages
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    Wieling, Martijn
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 974 - 978
  • [23] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, Sadao
    Honda, Masaaki
    IEICE Transactions on Information and Systems, 2004, E87-D (05) : 1071 - 1078
  • [24] Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model
    Hiroya, S
    Honda, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1071 - 1078
  • [25] Speech recognition based on a combination of acoustic features with articulatory information
    LU Xugang DANG Jianwu (Japan Advanced Institute of Science and Technology
    ChineseJournalofAcoustics, 2005, (03) : 271 - 279
  • [26] Generalized Variable Parameter HMMs Based Acoustic-to-articulatory Inversion
    Xie, Xurong
    Liu, Xunying
    Wang, Lan
    Su, Rongfeng
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 279 - 283
  • [27] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
  • [28] An episodic memory-based solution for the acoustic-to-articulatory inversion problem
    Demange, Sebastien
    Ouni, Slim
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (05): : 2921 - 2930
  • [29] Temporal Convolution Network Based Joint Optimization of Acoustic-to-Articulatory Inversion
    Sun, Guolun
    Huang, Zhihua
    Wang, Li
    Zhang, Pengyuan
    APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [30] Acoustic-to-articulatory mapping based on mixture of probabilistic canonical correlation analysis
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 989 - 993