Multi-style learning with denoising autoencoders for acoustic modeling in the internet of things (IoT)

被引：11

作者：

Lin, Payton ^{[1
]}

Lyu, Dau-Cheng ^{[2
]}

Chen, Fei ^{[3
]}

Wang, Syu-Siang ^{[1
]}

Tsao, Yu ^{[1
]}

机构：

[1] Acad Sinica, Res Ctr Informat Technol Innovat, Sect 2, Acad Rd, Taipei 11529, Taiwan

[2] ASUSTek Comp Inc, Taipei, Taiwan

[3] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen, Peoples R China

来源：

COMPUTER SPEECH AND LANGUAGE | 2017年 / 46卷

关键词：

Deep learning; Deep neural networks; Multi-style train"ing; Deep denoising autoencoders; Mixed training; Representation learning; Data combination; Data synthesis; Noise injection theory; Feature compensation; Automatic speech recognition; Internet of things (IoT); DEEP NEURAL-NETWORKS; BANDWIDTH TRAINING DATA; SPEECH RECOGNITION; NOISE INJECTION; FRONT-END; REGULARIZATION; DISTRIBUTIONS; COMPENSATION;

D O I：

10.1016/j.csl.2017.02.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a multi-style learning (multi-style training deep learning) procedure that relies on deep denoising autoencoders (DAEs) to extract and organize the most discriminative information in a training database. Traditionally, multi-style training procedures require either collecting or artificially creating data samples (e.g., by noise injection or data combination) and training a deep neural network (DNN) with all of these different conditions. To expand the applicability of deep learning, the present study instead adopts a DAE to augment the original training set. First, a DAE is utilized to synthesize data that captures useful structure in the input distribution. Next, this synthetic data is combined and mixed within the original training set to exploit the powerful capabilities of DNN classifiers to learn the complex decision boundaries in heterogeneous conditions. By assigning a DAE to synthesize additional examples of representative variations, multi-style learning makes class boundaries less sensitive to corruptions by enforcing back-end DNNs to emphasize on the most discriminative patterns. Moreover, this deep learning technique mitigates the cost and time of data collection and is easy to incorporate into the internet of things (IoT). Results showed these data-mixed DNNs provided consistent performance improvements without even requiring any preprocessing on the test sets. (C) 2017 Elsevier Ltd. All rights reserved..

引用

页码：481 / 495

页数：15

共 74 条

[1] The effects of adding noise during backpropagation training on a generalization performance [J].

An, GZ .

NEURAL COMPUTATION, 1996, 8 (03) :643-674

[2]

[Anonymous], 2005, International journal of computational linguistics & Chinese language processing

[3]

[Anonymous], P INTERSPEECH

[4] The Internet of Things: A survey [J].

Atzori, Luigi ;

Iera, Antonio ;

Morabito, Giacomo .

COMPUTER NETWORKS, 2010, 54 (15) :2787-2805

[5] Integrating articulatory data in deep neural network-based acoustic modeling [J].

Badino, Leonardo ;

Canevari, Claudia ;

Fadiga, Luciano ;

Metta, Giorgio .

COMPUTER SPEECH AND LANGUAGE, 2016, 36 :173-195

[6] Editorial introduction to the Neural Networks special issue on Deep Learning of Representations [J].

Bengio, Yoshua ;

Lee, Honglak .

NEURAL NETWORKS, 2015, 64 :1-3

[7] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[8] TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].

BISHOP, CM .

NEURAL COMPUTATION, 1995, 7 (01) :108-116

[9]

CHAOUCHI H., 2013, The internet of things: connecting objects

[10]

Chen BL, 2000, INT CONF ACOUST SPEE, P1771, DOI 10.1109/ICASSP.2000.862096

← 1 2 3 4 5 6 7 8 →