Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

被引：0

作者：

Xie, Xurong ^{[1
,2
]}

Liu, Xunying ^{[1
,2
]}

Lee, Tan ^{[1
]}

Wang, Lan ^{[2
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China

来源：

2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年

关键词：

articulatory inversion; stacked; deep neural network; mixture density network; EMA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.

引用

页码：36 / 40

页数：5

共 50 条

[1] Deep Neural Network Based Acoustic-to-articulatory Inversion Using Phone Sequence Information
Xie, Xurong
Liu, Xunying
Wang, Lan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1497 - 1501
[2] Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models
Shahrebabaki, Abdolreza Sabzi
Salvi, Giampiero
Svendsen, Torbjorn
Siniscalchi, Sabato Marco
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 135 - 147
[3] An Empirical Investigation of the Nonuniqueness in the Acoustic-to-Articulatory Mapping
Qin, Chao
Carreira-Perpinan, Miguel A.
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2300 - 2303
[4] Generalized Variable Parameter HMMs Based Acoustic-to-articulatory Inversion
Xie, Xurong
Liu, Xunying
Wang, Lan
Su, Rongfeng
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 279 - 283
[5] A Trajectory Mixture Density Network for the Acoustic-Articulatory Inversion Mapping
Richmond, Korin
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 577 - 580
[6] Predicting Multiple Pregrasping Poses by Combining Deep Convolutional Neural Networks with Mixture Density Networks
Moon, Sungphill
Park, Youngbin
Suh, Il Hong
NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 581 - 590
[7] Integrated acoustic echo and background noise suppression based on stacked deep neural networks
Seo, Hyeji
Lee, Moa
Chang, Joon-Hyuk
APPLIED ACOUSTICS, 2018, 133 : 194 - 201
[8] Deep supervised learning with mixture of neural networks
Hu, Yaxian
Luo, Senlin
Han, Longfei
Pan, Limin
Zhang, Tiemei
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 102
[9] Probabilistic inverse design of metasurfaces using mixture density neural networks
Torfeh, Mahsa
Hsu, Chia Wei
JOURNAL OF PHYSICS-PHOTONICS, 2025, 7 (01):
[10] Stochastic loss reserving with mixture density neural networks
Al-Mudafer, Muhammed Taher
Avanzi, Benjamin
Taylor, Greg
Wong, Bernard
INSURANCE MATHEMATICS & ECONOMICS, 2022, 105 : 144 - 174

← 1 2 3 4 5 →