Improving Pronunciation Erroneous Tendency Detection with Convolutional Long Short-Term Memory

被引：0

作者：

Yang, Longfei ^{[1
]}

Gao, Yingming ^{[2
]}

Xie, Yanlu ^{[1
]}

Zhang, Jinsong ^{[1
]}

机构：

[1] Beijing Language & Culture Univ, Coll Informat Sci, Beijing 100083, Peoples R China

[2] Tech Univ Dresden, Inst Acoust & Speech Commun, D-01062 Dresden, Germany

来源：

2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP) | 2017年

基金：

中央高校基本科研业务费专项资金资助;

关键词：

component:CAPT; mispronunciation detection; deep learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Corrective feedbacks are much more desirable than pure scores since they provide more information to guide learners to correct their erroneous pronunciations in the area of computer assisted pronunciation teaching (CAPT). For this purpose, we previously proposed pronunciation erroneous tendency (PET), which represents the errors from the aspects of articulation manner and constriction place. And we implemented PET detection system with Gaussian Mixture Model (GMM) and Deep Neural Networks (DNN) in previous work [1-2]. However, it is still challenging to achieve a high-performance system because of context dependency of PETs and data sparseness problem. In this paper, we first introduced data augmentation scheme to mitigate data sparseness problem. To further improve the performance, we proposed taking advantage of the LSTM and CNN by combining them into a unified system. Experimental results suggested that the proposed CNN-LSTM outperformed other models in our previous work.

引用

页码：52 / 56

页数：5

共 19 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2]

Bengio Yoshua, 2013, CORR

[3]

Cao W, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1922

[4]

Duan RC, 2014, INTERSPEECH, P1478

[5]

Gao YM, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P693

[6]

Graves A, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P273, DOI 10.1109/ASRU.2013.6707742

[7]

Harrison AlissaM., 2009, INT WORKSH SPEECH LA, P45

[8]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[9]

Hu W., 2013, INTERSPEECH 2013 14, P886

[10]

Ko T., INTERSPEECH2015

← 1 2 →