Temporal Alignment for Deep Neural Networks

被引：0

作者：

Lin, Payton ^{[1
]}

Lyu, Dau-Cheng ^{[2
]}

Chang, Yun-Fan ^{[1
]}

Tsao, Yu ^{[1
]}

机构：

[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan

[2] ASUS Headquarters, Adv Technol Div, Kauhsiung, Taiwan

来源：

2015 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP) | 2015年

关键词：

alignment; temporal features; deep neural networks; HIDDEN MARKOV-MODELS; SPEECH RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Alternative features were derived from extracted temporal envelope bank (TBANK). These simplified temporal representations were investigated in alignment procedures to generate frame-level training labels for deep neural networks (DNNs). TBANK features improved temporal alignments both for supervised training and for context dependent tree building.

引用

页码：108 / 112

页数：5

共 30 条

[1]

[Anonymous], 2015, P ICASSP

[2]

[Anonymous], 2010, P NIPS WORKSH DEEP L

[3]

[Anonymous], 2014, P ICASSP

[4]

[Anonymous], 1973, Pattern Classification and Scene Analysis

[5]

Bacchiani M., 2014, P INTERSPEECH

[6] DECODING FOR CHANNELS WITH INSERTIONS, DELETIONS, AND SUBSTITUTIONS WITH APPLICATIONS TO SPEECH RECOGNITION [J].

BAHL, LR ;

JELINEK, F .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1975, 21 (04) :404-411

[7] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[8] PHONEMIC HIDDEN MARKOV-MODELS WITH CONTINUOUS MIXTURE OUTPUT DENSITIES FOR LARGE VOCABULARY WORD RECOGNITION [J].

DENG, L ;

KENNY, P ;

LENNIG, M ;

GUPTA, V ;

SEITZ, F ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (07) :1677-1681

[9]

DENG L, 1991, P IEEE WORKSH NEUR N, P411

[10] Speech Recognition Using Hidden Markov Models with Polynomial Regression Functions as Nonstationary States [J].

Deng, Li ;

Aksmanovic, Mike ;

Sun, Xiaodong ;

Wu, C. F. Jeff .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :507-520

← 1 2 3 →