A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH

被引:0
作者
Illa, Aravind [1 ]
Meenakshi, Nisha G. [1 ]
Ghosh, Prasanta Kumar [1 ]
机构
[1] Indian Inst Sci IISc, Elect Engn, Bangalore 560012, Karnataka, India
来源
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年
关键词
acoustic-to-articulatory inversion; neutral speech; whispered speech; electromagnetic articulography; FEATURES; MODEL;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Whispered speech is known to have different characteristics in acoustics and articulation compared to neutral speech. In this study, we compare the accuracy with which the articulation can be recovered from the acoustics of both types of speech, individually. Acoustic-to-articulatory inversion (AAI) is performed with twelve articulatory features using the deep neural network (DNN) with data obtained from four subjects. We consider AAI in matched and mis-matched train-test conditions, where the speech types in training and test are identical and different respectively. Experiments in matched condition reveal that the AAI performance for whispered speech drops significantly compared to that for neutral speech, only for jaw, tongue tip and tongue body, consistently, for all four subjects. This indicates that the whispered speech encodes information about the rest of the articulators to a degree similar to that of the neutral speech. Experiments in the mis-matched condition show a consistent drop in the AAI performance compared to the matched condition. This drop in performance from matched to mis-matched condition is found be the highest for upper lip which indicates that the upper lip movement could be encoded differently in whispered speech compared to that in neutral speech.
引用
收藏
页码:5075 / 5079
页数:5
相关论文
共 39 条
  • [1] INVERSION OF ARTICULATORY-TO-ACOUSTIC TRANSFORMATION IN VOCAL-TRACT BY A COMPUTER-SORTING TECHNIQUE
    ATAL, BS
    CHANG, JJ
    MATHEWS, MV
    TUKEY, JW
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (05) : 1535 - 1555
  • [2] Chollet Francois., 2015, Keras
  • [3] Head motion synthesis from speech using deep neural networks
    Ding, Chuang
    Xie, Lei
    Zhu, Pengcheng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9871 - 9888
  • [4] Frankel J., 2000, P INT C SPOK LANG PR
  • [5] Gao M., 2002, THESIS
  • [6] A generalized smoothness criterion for acoustic-to-articulatory inversion
    Ghosh, Prasanta Kumar
    Narayanan, Shrikanth
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (04) : 2162 - 2172
  • [7] Lip kinematics for |p| and |b| production during whispered and voiced speech
    Higashikawa, M
    Green, JR
    Moore, CA
    Minifie, FD
    [J]. FOLIA PHONIATRICA ET LOGOPAEDICA, 2003, 55 (01) : 17 - 27
  • [8] PATHOPHYSIOLOGY OF MOTOR SPEECH DISORDERS (DYSARTHRIA)
    HIROSE, H
    [J]. FOLIA PHONIATRICA, 1986, 38 (2-4): : 61 - 88
  • [9] Head and facial gestures synthesis using PAD model for an expressive talking avatar
    Jia, Jia
    Wu, Zhiyong
    Zhang, Shen
    Meng, Helen M.
    Cai, Lianhong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 73 (01) : 439 - 461
  • [10] Emotional Audio-Visual Speech Synthesis Based on PAD
    Jia, Jia
    Zhang, Shen
    Meng, Fanbo
    Wang, Yongxin
    Cai, Lianhong
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 570 - 582