Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

被引:12
|
作者
Baby, Arun [1 ]
Prakash, Jeena J. [1 ]
Vignesh, Rupak [1 ]
Murthy, Hema A. [1 ]
机构
[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Deep Neural Networks; Convolutional Neural Networks; phonetic segmentation; signal processing cues;
D O I
10.21437/Interspeech.2017-666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model- hidden Markov model (GMM-HMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work. we had proposed the use of signal processing cues in tandem with GMM-HMM based forced alignment for boundary correction for building Indian language TTS systems. In this paper, we capitalise on the ability of robust acoustic modeling techniques such as deep neural networks (DNN) and convolutional deep neural networks (CNN) for acoustic modeling. The GMM-HMM based forced alignment is replaced by DNN-HMM/CNN-HMM based forced alignment. Signal processing cues are used to correct the segment boundaries obtained using DNN-HMM/CNN-HMM segmentation. TTS systems built using these boundaries show a relative improvement in synthesis quality.
引用
收藏
页码:3817 / 3821
页数:5
相关论文
共 50 条
  • [1] On the processing of fuzzy patterns for text independent phonetic speech segmentation
    Huerta-Hernandez, Luis D.
    Reyes-Garcia, Carlos A.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, PROCEEDINGS, 2006, 4225 : 437 - 445
  • [2] A survey on speech synthesis techniques in Indian languages
    Soumya Priyadarsini Panda
    Ajit Kumar Nayak
    Satyananda Champati Rai
    Multimedia Systems, 2020, 26 : 453 - 478
  • [3] A survey on speech synthesis techniques in Indian languages
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    Rai, Satyananda Champati
    MULTIMEDIA SYSTEMS, 2020, 26 (04) : 453 - 478
  • [4] Recent Trends in Text to Speech Synthesis of Indian Languages
    Joshi, Sarang L.
    Bairagi, Vinayak K.
    HELIX, 2019, 9 (03): : 4931 - 4936
  • [5] On speech-to-text alignment, phonetic labeling, and recursive signal processing
    Andersson, Ake
    Doktorsavhandlingar vid Chalmers Tekniska Hogskola, 1996, (1184):
  • [6] An efficient model for text-to-speech synthesis in Indian languages
    Panda, Soumya Priyadarsini
    Nayak, Ajit Kumar
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2015, 18 (03) : 305 - 315
  • [7] Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques
    Haeb-Umbach, Reinhold
    Watanabe, Shinji
    Nakatani, Tomohiro
    Bacchiani, Michiel
    Hoffmeister, Bjoern
    Seltzer, Michael L.
    Zen, Heiga
    Souden, Mehrez
    IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (06) : 111 - 124
  • [8] A review of deep learning techniques for speech processing
    Mehrish, Ambuj
    Majumder, Navonil
    Bharadwaj, Rishabh
    Mihalcea, Rada
    Poria, Soujanya
    INFORMATION FUSION, 2023, 99
  • [9] Importance of Signal Processing Cues in Transcription Correction for Low-Resource Indian Languages
    Prakash, Jeena J.
    Rajan, Golda Brunet
    Murthy, Hema A.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
  • [10] Phoneme Segmentation using Deep Learning for Speech Synthesis
    Lee, Young Han
    Yang, Jong-Yeol
    Cho, Choongsang
    Jung, Hyedong
    PROCEEDINGS OF THE 2018 CONFERENCE ON RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS (RACS 2018), 2018, : 59 - 61