Phone Recognition for Lhasa-Tibetan Based on Articulatory Features Augmentation Learning

被引：0

作者：

Zhao, Yue ^{[1
]}

Zhao, Rui ^{[1
]}

Xu, Xiaona ^{[1
]}

Wu, Licheng ^{[1
]}

Ji, Qiang ^{[2
]}

机构：

[1] Minzu Univ China, Sch Informat Engn, Beijing, Peoples R China

[2] Rensselaer Polytech Inst, Troy, NY USA

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

关键词：

articulatory features; latent attribute learning; sparse coding; phone recognition;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In a series of studies, articulatory features used as speech attributes for automatic speech recognition systems have been shown to improve the performance. The existing articulatory features are defined by phonetician as a set of articulatory descriptions of phones, which represent some semantic information explaining how humans produce speech sounds via the interaction of different physiological structures. But these manually specified attributes suffer from the incomplete capturing articulation information of languages and are not distinctive enough for accurate phoneme recognition. In this paper, we are solving the problem of a more complete set of articulatory features representation by sparse coding methods. For example of Lhasa-Tibetan language, we learned the latent attributes that sparsely represent more speech articulation information in Tibetan language. Models based on the concatenated semantic and latent speech attributes performed the better accuracy over the existing methods based on semantic speech attributes fused with cepstral features in our experiments for Tibetan phone recognition.

引用

页数：4

共 14 条

[1] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].

Aharon, Michal ;

Elad, Michael ;

Bruckstein, Alfred .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322

[2] Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs [J].

Cetin, Oezguer ;

Magimai-Doss, Mathew ;

Livescu, Karen ;

Kantor, Arthur ;

King, Simon ;

Bartels, Chris ;

Frankel, Joe .

2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, :36-+

[3]

Kempton T., 2011, INTERSPEECH, P3165

[4]

Li G. Y., 2013, COMPUTER ENG SCI, V35

[5]

Mitral V., 2014, ICASSP

[6] Toward a detector-based universal phone recognizer [J].

Siniscalchi, Sabato Marco ;

Svendsen, Torbiorn ;

Lee, Chin-Hui .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4261-+

[7] Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data [J].

Siniscalchi, Sabato Marco ;

Lyu, Dau-Cheng ;

Svendsen, Torbjorn ;

Lee, Chin-Hui .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03) :875-887

[8] A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition [J].

Siniscalchi, Sabato Marco ;

Lee, Chin-Hui .

SPEECH COMMUNICATION, 2009, 51 (11) :1139-1153

[9]

Stolcke A, 2006, INT CONF ACOUST SPEE, P321

[10]

Stuker S., 2003, THESIS

← 1 2 →