Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

被引：0

作者：

Takahashi, Naoya ^{[1
]}

Naghibi, Tofigh ^{[2
]}

Pfister, Beat ^{[2
]}

机构：

[1] Sony Corp, Tokyo, Japan

[2] Swiss Fed Inst Technol, Speech Proc Grp, Zurich, Switzerland

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

speech recognition; deep neural networks; semi-supervised learning; dictionary; sub-word unit; k-dimensional Viterbi; SPEECH RECOGNITION;

D O I：

10.21437/Interspeech.2016-761

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

引用

页码：1141 / 1145

页数：5

共 50 条

[41] CSGNN: Improving Graph Neural Networks with Contrastive Semi-supervised Learning
Song, Yumeng
Gu, Yu
Li, Xiaohua
Li, Chuanwen
Yu, Ge
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 731 - 738
[42] Semi-supervised learning with connectivity-driven convolutional neural networks
Amorim, Willian Paraguassu
Rosa, Gustavo Henrique
Thomazella, Rogerio
Cogo Castanho, Jose Eduardo
Lofrano Dotto, Fabio Romano
Rodrigues Junior, Oswaldo Pons
Marana, Aparecido Nilceu
Papa, Joao Paulo
PATTERN RECOGNITION LETTERS, 2019, 128 : 16 - 22
[43] Semi-supervised deep embedded clustering
Ren, Yazhou
Hu, Kangrong
Dai, Xinyi
Pan, Lili
Hoi, Steven C. H.
Xu, Zenglin
NEUROCOMPUTING, 2019, 325 : 121 - 130
[44] Semi-Supervised Speech Emotion Recognition With Ladder Networks
Parthasarathy, Srinivas
Busso, Carlos
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2697 - 2709
[45] Estimation of Interaction Forces in Robotic Surgery using a Semi-Supervised Deep Neural Network Model
Marban, Arturo
Srinivasan, Vignesh
Samek, Wojciech
Fernandez, Josep
Casals, Alicia
2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 761 - 768
[46] Labeling Malicious Communication Samples Based on Semi-Supervised Deep Neural Network
Shao, Guolin
Chen, Xingshu
Zeng, Xuemei
Wang, Lina
CHINA COMMUNICATIONS, 2019, 16 (11) : 183 - 200
[47] ABNORMALITY DETECTION USING DEEP NEURAL NETWORKS WITH ROBUST QUASI-NORM AUTOENCODING AND SEMI-SUPERVISED LEARNING
Shah, Meet P.
Merchant, S. N.
Awate, Suyash P.
2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 568 - 572
[48] Labeling Malicious Communication Samples Based on Semi-Supervised Deep Neural Network
Guolin Shao
Xingshu Chen
Xuemei Zeng
Lina Wang
中国通信, 2019, 16 (11) : 183 - 200
[49] Limited Data Spectrum Sensing Based on Semi-Supervised Deep Neural Network
Zhang, Yupei
Zhao, Zhijin
IEEE ACCESS, 2021, 9 : 166423 - 166435
[50] A Deep Neural Network Based on ELM for Semi-supervised Learning of Image Classification
Chang, Peiju
Zhang, Jiangshe
Hu, Junying
Song, Zengjie
NEURAL PROCESSING LETTERS, 2018, 48 (01) : 375 - 388

← 1 2 3 4 5 →