Improved subject-independent acoustic-to-articulatory inversion

被引:16
|
作者
Afshan, Amber [1 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Karnataka NITK, Natl Inst Technol, Mangalore 575025, India
[2] Indian Inst Sci, Dept Elect Engn, Bangalore 560012, Karnataka, India
关键词
Acoustic-to-articulatory inversion; Subject-independence; Generic acoustic space; Adaptation; MAXIMUM-LIKELIHOOD; FEATURES;
D O I
10.1016/j.specom.2014.07.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In subject-independent acoustic-to-articulatory inversion, the articulatory kinematics of a test subject are estimated assuming that the training corpus does not include data from the test subject. The training corpus in subject-independent inversion (SII) is formed with acoustic and articulatory kinematics data and the acoustic mismatch between training and test subjects is then estimated by an acoustic normalization using acoustic data drawn from a large pool of speakers called generic acoustic space (GAS). In this work, we focus on improving the SII performance through better acoustic normalization and adaptation. We propose unsupervised and several supervised ways of clustering GAS for acoustic normalization. We perform an adaptation of acoustic models of GAS using the acoustic data of the training and test subjects in SII. It is found that SII performance significantly improves (similar to 25% relative on average) over the subject-dependent inversion when the acoustic clusters in GAS correspond to phonetic units (or states of 3-state phonetic HMMs) and when the acoustic model built on GAS is adapted to training and test subjects while optimizing the inversion criterion. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [1] Improved subject-independent acoustic-to-articulatory inversion
    National Institute of Technology, Karnataka , Mangalore
    575025, India
    不详
    560012, India
    Speech Commun, (1-16):
  • [2] A SUBJECT-INDEPENDENT ACOUSTIC-TO-ARTICULATORY INVERSION
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth S.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4624 - 4627
  • [3] Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2011, 130 (04): : EL251 - EL257
  • [4] Better acoustic normalization in subject independent acoustic-to-articulatory inversion: benefit to recognition
    Afshan, Amber
    Ghosh, Prasanta Kumar
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5395 - 5399
  • [5] Autoregressive Articulatory WaveNet Flow for Speaker-Independent Acoustic-to-Articulatory Inversion
    Bozorg, Narjes
    Johnson, Michael T.
    Soleymanpour, Mohammad
    2021 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2021, : 156 - 161
  • [6] Jerk Minimization for Acoustic-To-Articulatory Inversion
    Rajpal, Avni
    Patil, Hemant A.
    9th ISCA Speech Synthesis Workshop, SSW 2016, 2016, : 82 - 87
  • [7] Formant Trajectories for Acoustic-to-Articulatory Inversion
    Ozbek, I. Yuecel
    Hasegawa-Johnson, Mark
    Demirekler, Muebeccel
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2783 - +
  • [8] MLLR-PRSW for Kinematic-Independent Acoustic-to-Articulatory Inversion
    Bozorg, Narjes
    Johnson, Michael T.
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [9] Incorporation of phonetic constraints in acoustic-to-articulatory inversion
    Potard, Blaise
    Laprie, Yves
    Ouni, Slim
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2008, 123 (04): : 2310 - 2323
  • [10] A DEEP RECURRENT APPROACH FOR ACOUSTIC-TO-ARTICULATORY INVERSION
    Liu, Peng
    Yu, Quanjie
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lainhong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4450 - 4454