A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH

被引：0

作者：

Illa, Aravind ^{[1
]}

Meenakshi, Nisha G. ^{[1
]}

Ghosh, Prasanta Kumar ^{[1
]}

机构：

[1] Indian Inst Sci IISc, Elect Engn, Bangalore 560012, Karnataka, India

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

acoustic-to-articulatory inversion; neutral speech; whispered speech; electromagnetic articulography; FEATURES; MODEL;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Whispered speech is known to have different characteristics in acoustics and articulation compared to neutral speech. In this study, we compare the accuracy with which the articulation can be recovered from the acoustics of both types of speech, individually. Acoustic-to-articulatory inversion (AAI) is performed with twelve articulatory features using the deep neural network (DNN) with data obtained from four subjects. We consider AAI in matched and mis-matched train-test conditions, where the speech types in training and test are identical and different respectively. Experiments in matched condition reveal that the AAI performance for whispered speech drops significantly compared to that for neutral speech, only for jaw, tongue tip and tongue body, consistently, for all four subjects. This indicates that the whispered speech encodes information about the rest of the articulators to a degree similar to that of the neutral speech. Experiments in the mis-matched condition show a consistent drop in the AAI performance compared to the matched condition. This drop in performance from matched to mis-matched condition is found be the highest for upper lip which indicates that the upper lip movement could be encoded differently in whispered speech compared to that in neutral speech.

引用

页码：5075 / 5079

页数：5

共 39 条

[31] Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
Toda, Tomoki
Black, Alan W.
Tokuda, Keiichi
[J]. SPEECH COMMUNICATION, 2008, 50 (03) : 215 - 227
[32] Uria B., 2011, NIPS 2011 WORKSH DEE
[33] Wrench A., 1999, MOCHA-TIMIT. speech database
[34] Wrench A.A., 2000, PROC ICSLP BEIJING, P145
[35] Acoustic to articulatory mapping with deep neural network
Wu, Zhiyong
Zhao, Kai
Wu, Xixin
Lan, Xinyu
Meng, Helen
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9889 - 9907
[36] Yoshioka Hirohide, 2008, J ACOUSTICAL SOC AM, V123
[37] Zhang C., 2007, INTERSPEECH, P2289
[38] Zhang Le, IEEE SIGNAL PROCESSI, V15, P245
[39] Zlokarnik I., 1995, J ACOUST SOC AM, V97, P3246, DOI [10.1121/1.411699, DOI 10.1121/1.411699]

← 1 2 3 4 →