Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

被引:12
|
作者
Baby, Arun [1 ]
Prakash, Jeena J. [1 ]
Vignesh, Rupak [1 ]
Murthy, Hema A. [1 ]
机构
[1] Indian Inst Technol Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
Deep Neural Networks; Convolutional Neural Networks; phonetic segmentation; signal processing cues;
D O I
10.21437/Interspeech.2017-666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model- hidden Markov model (GMM-HMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work. we had proposed the use of signal processing cues in tandem with GMM-HMM based forced alignment for boundary correction for building Indian language TTS systems. In this paper, we capitalise on the ability of robust acoustic modeling techniques such as deep neural networks (DNN) and convolutional deep neural networks (CNN) for acoustic modeling. The GMM-HMM based forced alignment is replaced by DNN-HMM/CNN-HMM based forced alignment. Signal processing cues are used to correct the segment boundaries obtained using DNN-HMM/CNN-HMM segmentation. TTS systems built using these boundaries show a relative improvement in synthesis quality.
引用
收藏
页码:3817 / 3821
页数:5
相关论文
共 50 条
  • [21] Robust Classification of PD Sources Using Deep Learning and Signal Processing Techniques
    Almehdhar, Awad
    Prochazka, Radek
    2024 INTERNATIONAL CONFERENCE ON DIAGNOSTICS IN ELECTRICAL ENGINEERING, DIAGNOSTIKA 2024, 2024, : 117 - 121
  • [22] CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application
    Chen, Yu-Wen
    Hung, Kuo-Hsuan
    Li, You-Jin
    Kang, Alexander Chao-Fu
    Lai, Ya-Hsin
    Liu, Kai-Chun
    Fu, Szu-Wei
    Wang, Syu-Siang
    Tsao, Yu
    IEEE ACCESS, 2022, 10 : 46082 - 46099
  • [24] Integrating Articulatory Information in Deep Learning-based Text-to-Speech Synthesis
    Cao, Beiming
    Kim, Myungjong
    van Santen, Jan
    Mau, Ted
    Wang, Jun
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 254 - 258
  • [25] MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language
    Mishev, Kostadin
    Karovska Ristovska, Aleksandra
    Trajanov, Dimitar
    Eftimov, Tome
    Simjanoska, Monika
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 14
  • [26] Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning
    Ahmad, Hawraz A.
    Rashid, Tarik A.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [27] Application of Signal Processing and Machine Learning Techniques for Segmentation and Spatial Registration of Material Property Data
    Dierken, Josiah
    Sparkman, Daniel
    Donegan, Sean
    Wallentine, Sarah
    Wertz, John
    Zainey, David
    45TH ANNUAL REVIEW OF PROGRESS IN QUANTITATIVE NONDESTRUCTIVE EVALUATION, VOL 38, 2019, 2102
  • [28] Deep Learning Enhanced Signal Processing Techniques for WBAN-Enabled Telemedicine Applications
    Kumaran, S.
    Samyuktha, P. M.
    Bhavyashree, M. R.
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, : 1010 - 1015
  • [29] Signal processing for enhancing railway communication by integrating deep learning and adaptive equalization techniques
    Wang, Yucai
    Chang, Wei
    Li, Jingjiao
    Yang, Cuilei
    PLOS ONE, 2024, 19 (10):
  • [30] A comparative study on various pre-processing techniques and deep learning algorithms for text classification
    Bhuvaneshwari P.
    Rao A.N.
    International Journal of Cloud Computing, 2022, 11 (01): : 61 - 78