Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

被引:12
|
作者
Baby, Arun [1 ]
Prakash, Jeena J. [1 ]
Vignesh, Rupak [1 ]
Murthy, Hema A. [1 ]
机构
[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Deep Neural Networks; Convolutional Neural Networks; phonetic segmentation; signal processing cues;
D O I
10.21437/Interspeech.2017-666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model- hidden Markov model (GMM-HMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work. we had proposed the use of signal processing cues in tandem with GMM-HMM based forced alignment for boundary correction for building Indian language TTS systems. In this paper, we capitalise on the ability of robust acoustic modeling techniques such as deep neural networks (DNN) and convolutional deep neural networks (CNN) for acoustic modeling. The GMM-HMM based forced alignment is replaced by DNN-HMM/CNN-HMM based forced alignment. Signal processing cues are used to correct the segment boundaries obtained using DNN-HMM/CNN-HMM segmentation. TTS systems built using these boundaries show a relative improvement in synthesis quality.
引用
收藏
页码:3817 / 3821
页数:5
相关论文
共 50 条
  • [31] COVID-19 infection segmentation using hybrid deep learning and image processing techniques
    Antar, Samar
    Abd El-Sattar, Hussein Karam Hussein
    Abd-Rahman, Mohamed H.
    Ghaleb, Fayed F. M.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [32] COVID-19 infection segmentation using hybrid deep learning and image processing techniques
    Samar Antar
    Hussein Karam Hussein Abd El-Sattar
    Mohammad H. Abdel-Rahman
    Fayed F. M. Ghaleb
    Scientific Reports, 13
  • [33] Text to Speech Synthesis: A Systematic Review, Deep Learning Based Architecture and Future Research Direction
    Khanam, Fahima
    Munmun, Farha Akhter
    Ritu, Nadia Afrin
    Saha, Aloke Kumar
    Mridha, Muhammad Firoz
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 398 - 412
  • [34] Collaborative autonomous system based wireless security in signal processing using deep learning techniques
    Selvam, L.
    Garg, Shruti
    Prasad, R. Murali
    Qamar, Shamimul
    Lakshmi, K. Mohana
    Ratna, Vallabhuni Rajeev
    OPTIK, 2023, 272
  • [35] English speech sound improvement system based on deep learning from signal processing to semantic recognition
    Yang, Yucheng
    Yue, Yibo
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 505 - 515
  • [36] English speech sound improvement system based on deep learning from signal processing to semantic recognition
    Yucheng Yang
    Yibo Yue
    International Journal of Speech Technology, 2020, 23 : 505 - 515
  • [37] Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification
    Maslej-Kresnakova, Viera
    Sarnovsky, Martin
    Butka, Peter
    Machova, Kristina
    APPLIED SCIENCES-BASEL, 2020, 10 (23): : 1 - 26
  • [38] PITCH-SYNCHRONOUS WAVE-FORM PROCESSING TECHNIQUES FOR TEXT-TO-SPEECH SYNTHESIS USING DIPHONES
    MOULINES, E
    CHARPENTIER, F
    SPEECH COMMUNICATION, 1990, 9 (5-6) : 453 - 467
  • [39] Characterization of Deep Learning-Based Speech-Enhancement Techniques in Online Audio Processing Applications
    Rascon, Caleb
    SENSORS, 2023, 23 (09)
  • [40] Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)
    Kumar, S.
    Kumar, M. Anand
    Soman, K. P.
    JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 423 - 435