Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion

被引:0
|
作者
Xie, Xurong [1 ,2 ]
Liu, Xunying [1 ,2 ]
Lee, Tan [1 ]
Wang, Lan [2 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
来源
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年
关键词
articulatory inversion; stacked; deep neural network; mixture density network; EMA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic-to-articulatory inversion predicting articulatory movement based on the acoustic signal is useful for many applications like talking head, speech recognition, and education. DNN based technologies have achieved the state-of-the-art performance in the area. This paper investigates different stacked network architectures for acoustic-to-articulatory inversion. Two levels of DNNs or mixture density networks (MDNs) can be connected using different types of auxiliary features, including bottleneck features, directly generated features, and predicted articulatory features via MLPG algorithm extracted from the first level network. For the experiments, stacked systems using DNNs, time-delay DNNs (TDNNs), RNNs and MDNs were evaluated on both the MNGU0 English EMA database and AIMSL Chinese EMA database. Finally, on the default configurations of MNGU0 data using LSF acoustic features, the proposed stacked system using feed-forward MDNs with ellipsoid variance and MLPG generated features got 0.718mm in RMSE, which is similar to the RNN and RNN-MDN BLSTM systems with slower and more difficult training stage.
引用
收藏
页码:36 / 40
页数:5
相关论文
共 50 条
  • [41] Systematic investigation of hyperparameters on performance of deep neural networks: application to ovarian cancer phenotypes
    Hwangbo, Suhyun
    Kim, Se Ik
    Cho, Untack
    Suh, Dae-Shik
    Song, Yong-Sang
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2020, 24 (01) : 1 - 15
  • [42] DEEP NEURAL NETWORKS FOR COCHANNEL SPEAKER IDENTIFICATION
    Zhao, Xiaojia
    Wang, Yuxuan
    Wang, DeLiang
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4824 - 4828
  • [43] Selective Poisoning Attack on Deep Neural Networks
    Kwon, Hyun
    Yoon, Hyunsoo
    Park, Ki-Woong
    SYMMETRY-BASEL, 2019, 11 (07):
  • [44] AN ENSEMBLE OF DEEP NEURAL NETWORKS FOR OBJECT TRACKING
    Zhou, Xiangzeng
    Xie, Lei
    Zhang, Peng
    Zhang, Yanning
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 843 - 847
  • [45] Deep neural networks with visible intermediate layers
    Gao, Ying-Ying
    Zhu, Wei-Bin
    Zidonghua Xuebao/Acta Automatica Sinica, 2015, 41 (09): : 1627 - 1637
  • [46] A survey of quantization methods for deep neural networks
    Yang C.
    Zhang R.
    Huang L.
    Ti S.
    Lin J.
    Dong Z.
    Chen S.
    Liu Y.
    Yin X.
    Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2023, 45 (10): : 1613 - 1629
  • [47] An evolutionary building algorithm for Deep Neural Networks
    Zemouri, Ryad
    2017 12TH INTERNATIONAL WORKSHOP ON SELF-ORGANIZING MAPS AND LEARNING VECTOR QUANTIZATION, CLUSTERING AND DATA VISUALIZATION (WSOM), 2017, : 21 - 27
  • [48] Application of deep neural networks for multiples attenuation
    Song Huan
    Mao WeiJian
    Tang HuanHuan
    CHINESE JOURNAL OF GEOPHYSICS-CHINESE EDITION, 2021, 64 (08): : 2795 - 2808
  • [49] Deep Neural Networks for Behavioral Credit Rating
    Mercep, Andro
    Mrcela, Lovre
    Birov, Matija
    Kostanjcar, Zvonko
    ENTROPY, 2021, 23 (01) : 1 - 18
  • [50] An Architecture to Accelerate Convolution in Deep Neural Networks
    Ardakani, Arash
    Condo, Carlo
    Ahmadi, Mehdi
    Gross, Warren J.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2018, 65 (04) : 1349 - 1362