Automatic Pronunciation Generation by Utilizing a Semi-supervised Deep Neural Networks

被引:0
|
作者
Takahashi, Naoya [1 ]
Naghibi, Tofigh [2 ]
Pfister, Beat [2 ]
机构
[1] Sony Corp, Tokyo, Japan
[2] Swiss Fed Inst Technol, Speech Proc Grp, Zurich, Switzerland
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
speech recognition; deep neural networks; semi-supervised learning; dictionary; sub-word unit; k-dimensional Viterbi; SPEECH RECOGNITION;
D O I
10.21437/Interspeech.2016-761
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.
引用
收藏
页码:1141 / 1145
页数:5
相关论文
共 50 条
  • [1] SEMI-SUPERVISED TRAINING STRATEGIES FOR DEEP NEURAL NETWORKS
    Gibson, Matthew
    Cook, Gary
    Zhan, Puming
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 77 - 83
  • [2] Semi-supervised Deep Domain Adaptation via Coupled Neural Networks
    Ding, Zhengming
    Nasrabadi, Nasser M.
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5214 - 5224
  • [3] Semi-Supervised Clustering with Neural Networks
    Shukla, Ankita
    Cheema, Gullal S.
    Anand, Saket
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 152 - 161
  • [4] SEMI-SUPERVISED HYPERSPECTRAL UNMIXING WITH VERY DEEP CONVOLUTIONAL NEURAL NETWORKS
    Bai, Jiayu
    Feng, Ruyi
    Wang, Lizhe
    Li, Hao
    Li, Fengpeng
    Zhong, Yanfei
    Zhang, Liangpei
    IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 2400 - 2403
  • [5] Semi-supervised learning with convolutional neural networks for UAV images automatic recognition
    Amorim, Willian Paraguassu
    Tetila, Everton Castelao
    Pistori, Hemerson
    Papa, Joao Paulo
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 164
  • [6] Data Augmentation and Semi-supervised Learning for Deep Neural Networks-based Text Classifier
    Shim, Heereen
    Luca, Stijn
    Lowet, Dietwig
    Vanrumste, Bart
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1119 - 1126
  • [7] Comparison of Semi-supervised Deep Neural Networks for Anomaly Detection in Industrial Processes
    Chadha, Gavneet Singh
    Rabbani, Arfyan
    Schwung, Andreas
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 214 - 219
  • [8] Semi-Supervised Convolutional Neural Networks for Human Activity Recognition\
    Zeng, Ming
    Yu, Tong
    Wang, Xiao
    Nguyen, Le T.
    Mengshoel, Ole J.
    Lane, Ian
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 522 - 529
  • [9] Semi-Supervised Learning for Spanish Speech Recognition Using Deep Neural Networks
    Rosario Campomanes-Alvarez, Blanca
    Quiros, Pelayo
    Fernandez, Bernardo
    APPLICATIONS OF INTELLIGENT SYSTEMS, 2018, 310 : 19 - 29
  • [10] Semi-supervised Maximum Mutual Information Training of Deep Neural Network Acoustic Models
    Manohar, Vimal
    Povey, Daniel
    Khudanpur, Sanjeev
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2630 - 2634