NASAL SPEECH SOUNDS DETECTION USING CONNECTIONIST TEMPORAL CLASSIFICATION

被引:0
作者
Cernak, Milos [1 ]
Tong, Sibo [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Phone attributes; nasal sounds; connectionist temporal classification; RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Phone attributes, known also as distinctive or phonological features, belong to important classification of the speech sounds used in automatic speech processing. Training of conventional phone attribute detectors (classifiers), either based on acoustic measurements or deep learning approaches, requires decent phone boundary segmentation. This paper proposes a solution to train a phone attribute detector without phone alignment using an end-to-end phone attribute modeling based on the connectionist temporal classification. Experiments, performed for the nasal phone attribute on the LibriSpeech database, confirm that the proposed system outperforms conventional deep neural network detector, trained even on the same training data. Further improvements are observed with more training data. Conventional complex system that consists of feature extraction, phone forcealignment and deep neural network training is replaced by a more simpler Python package based on PyTorch, released as open-source.
引用
收藏
页码:5574 / 5578
页数:5
相关论文
共 21 条
[1]  
Amodei D, 2016, PR MACH LEARN RES, V48
[2]  
[Anonymous], P ICASSP
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]  
Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[5]  
Cernak M, 2015, INT CONF ACOUST SPEE, P4844, DOI 10.1109/ICASSP.2015.7178891
[6]  
Cho K., 2014, ARXIV, DOI 10.3115/v1/w14-4012
[7]  
Glass J. R., 1986, ICASSP 86 Proceedings. IEEE-IECEJ-ASJ International Conference on Acoustics, Speech and Signal Processing (Cat. No.86CH2243-4), P2767
[8]  
Graves A., 2006, PROC INT C MACH LEAR, P369, DOI DOI 10.1145/1143844.1143891
[9]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[10]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]