The impact of speaking rate on acoustic-to-articulatory inversion

被引：16

作者：

Illa, Aravind ^{[1
]}

Ghosh, Prasanta Kumar ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Elect Engn, Bangalore 560012, Karnataka, India

来源：

COMPUTER SPEECH AND LANGUAGE | 2020年 / 59卷

关键词：

Acoustic-to-articulatory inversion; Speaking rate; Electromagnetic articulograph; NEURAL-NETWORK MODEL; SPEECH; VELOCITY; MOVEMENT; JAW; LIP; COARTICULATION; CONSTRAINTS; VARIABILITY; ACQUISITION;

D O I：

10.1016/j.csl.2019.05.004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Acoustic characteristics and articulatory movements are known to vary with speaking rates. This study investigates the role of speaking rate on acoustic-to-articulatory inversion (AAI) performance using deep neural networks (DNNs). Since fast speaking rate causes fast articulatory motion as well as changes in spectro-temporal characteristics of the speech signal, the articulatory-acoustic map in a fast speaking rate could be different from that in a slow speaking rate. We examine how these differences alter the accuracy with which different articulatory positions could be recovered from the acoustics. AAI experiments are performed in both matched and mismatched train-test conditions using data of five subjects, in three different rates - normal, fast and slow (fast and slow rates are at least 1.3 times faster and slower than the normal rate). Experiments in matched cases reveal that, the errors in estimating vertical motion of sensors on the tongue articulators from acoustics with fast speaking rate, is significantly higher than those with slow speaking rate. Experiments in mis-matched conditions reveal that there is consistent drop in AAI performance compared to the matched condition. Further experiments performed by training AAI with acoustic-articulatory data pooled from different speaking rates reveal that a single DNN based AAI model is capable of learning multiple rate-specific mapping. (C) 2019 Elsevier Ltd. All rights reserved.

引用

页码：75 / 90

页数：16

共 69 条

[61]

Uria B, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P866

[62]

Vroomen Jean., 1993, Proceedings of the Third European Conference on Speech Communication and Technology, Berlin, P577

[63] Effects of speaking rate on second formant trajectories of selected vocalic nuclei [J].

Weismer, G ;

Berry, J .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2003, 113 (06) :3362-3378

[64]

Wrench A., 1999, MOCHA-TIMIT. speech database

[65]

Wrench A.A., 2000, PROC ICSLP BEIJING, P145

[66] Acoustic to articulatory mapping with deep neural network [J].

Wu, Zhiyong ;

Zhao, Kai ;

Wu, Xixin ;

Lan, Xinyu ;

Meng, Helen .

MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) :9889-9907

[67]

Young S.J., 1993, HTK HIDDEN MARKOV MO

[68] Acoustic-articulatory modeling with the trajectory HMM [J].

Zhang, Le ;

Renals, Steve .

IEEE SIGNAL PROCESSING LETTERS, 2008, 15 (245-248) :245-248

[69]

Zlokarnik Igor., 1995, The Journal of the Acoustical Society of America, V97, P3246, DOI [10.1121/1.411699, DOI 10.1121/1.411699]

← 1 2 3 4 5 6 7 →