Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

被引:0
|
作者
Takahashi, Naoya [1 ]
Naghibi, Tofigh [2 ]
Pfister, Beat [2 ]
机构
[1] Sony Corp, Tokyo, Japan
[2] Swiss Fed Inst Technol, Speech Proc Grp, Zurich, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speech recognition; deep neural networks; semi-supervised learning; dictionary; sub-word unit; k-dimensional Viterbi; SPEECH RECOGNITION;
D O I
10.21437/Interspeech.2016-761
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.
引用
收藏
页码:1141 / 1145
页数:5
相关论文
共 50 条
  • [41] CSGNN: Improving Graph Neural Networks with Contrastive Semi-supervised Learning
    Song, Yumeng
    Gu, Yu
    Li, Xiaohua
    Li, Chuanwen
    Yu, Ge
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 731 - 738
  • [42] Semi-supervised learning with connectivity-driven convolutional neural networks
    Amorim, Willian Paraguassu
    Rosa, Gustavo Henrique
    Thomazella, Rogerio
    Cogo Castanho, Jose Eduardo
    Lofrano Dotto, Fabio Romano
    Rodrigues Junior, Oswaldo Pons
    Marana, Aparecido Nilceu
    Papa, Joao Paulo
    PATTERN RECOGNITION LETTERS, 2019, 128 : 16 - 22
  • [43] Semi-supervised deep embedded clustering
    Ren, Yazhou
    Hu, Kangrong
    Dai, Xinyi
    Pan, Lili
    Hoi, Steven C. H.
    Xu, Zenglin
    NEUROCOMPUTING, 2019, 325 : 121 - 130
  • [44] Semi-Supervised Speech Emotion Recognition With Ladder Networks
    Parthasarathy, Srinivas
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2697 - 2709
  • [45] Estimation of Interaction Forces in Robotic Surgery using a Semi-Supervised Deep Neural Network Model
    Marban, Arturo
    Srinivasan, Vignesh
    Samek, Wojciech
    Fernandez, Josep
    Casals, Alicia
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 761 - 768
  • [46] Labeling Malicious Communication Samples Based on Semi-Supervised Deep Neural Network
    Shao, Guolin
    Chen, Xingshu
    Zeng, Xuemei
    Wang, Lina
    CHINA COMMUNICATIONS, 2019, 16 (11) : 183 - 200
  • [47] ABNORMALITY DETECTION USING DEEP NEURAL NETWORKS WITH ROBUST QUASI-NORM AUTOENCODING AND SEMI-SUPERVISED LEARNING
    Shah, Meet P.
    Merchant, S. N.
    Awate, Suyash P.
    2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 568 - 572
  • [48] Labeling Malicious Communication Samples Based on Semi-Supervised Deep Neural Network
    Guolin Shao
    Xingshu Chen
    Xuemei Zeng
    Lina Wang
    中国通信, 2019, 16 (11) : 183 - 200
  • [49] Limited Data Spectrum Sensing Based on Semi-Supervised Deep Neural Network
    Zhang, Yupei
    Zhao, Zhijin
    IEEE ACCESS, 2021, 9 : 166423 - 166435
  • [50] A Deep Neural Network Based on ELM for Semi-supervised Learning of Image Classification
    Chang, Peiju
    Zhang, Jiangshe
    Hu, Junying
    Song, Zengjie
    NEURAL PROCESSING LETTERS, 2018, 48 (01) : 375 - 388