Predicting Articulatory Movement from Text Using Deep Architecture with Stacked Bottleneck Features

被引：0

作者：

Wei, Zhen ^{[1
]}

Wu, Zhizheng ^{[3
]}

Xie, Lei ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Software & Microelect, Xian, Peoples R China

[2] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China

[3] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland

来源：

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA) | 2016年

基金：

美国国家科学基金会;

关键词：

articulatory movement prediction; stacked bottleneck features; deep neural network; NEURAL-NETWORKS; NECK FEATURES; SPEECH;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.

引用

页数：6

共 26 条

[1]

[Anonymous], 2016, P 9 ISCA SPEECH SYNT

[2]

[Anonymous], 2011, INTERSPEECH

[3]

[Anonymous], P INTERSPEECH

[4]

[Anonymous], P INTERSPEECH

[5]

Ben-Youssef Atef, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4573, DOI 10.1109/ICASSP.2014.6854468

[6] A self-learning predictive model of articulator movements during speech production [J].

Blackburn, CS ;

Young, S .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (03) :1659-1670

[7]

Deng L, 2013, INT CONF ACOUST SPEE, P8599, DOI 10.1109/ICASSP.2013.6639344

[8]

Gehring J, 2013, INT CONF ACOUST SPEE, P3377, DOI 10.1109/ICASSP.2013.6638284

[9] Optimizing bottle-neck features for LVCSR [J].

Grezl, Frantisek ;

Fousek, Petr .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4729-+

[10] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

← 1 2 3 →