A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

被引：0

作者：

Yogesh Kumar

Apeksha Koul

Chamkaur Singh

机构：

[1] Pandit Deendayal Energy University,Department of Computer Science and Engineering, School of Technology

[2] Punjabi University,Department of Computer Science and Engineering

[3] Chandigarh Group of Colleges,Department of Computer Applications

来源：

Multimedia Tools and Applications | 2023年 / 82卷

关键词：

Text-to-speech; Artificial intelligence; Speech synthesis; Deep learning; Pronunciation generation; Linguistic analysis;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Text-to-speech systems (TTS) have come a long way in the last decade and are now a popular research topic for creating various human-computer interaction systems. Although, a range of speech synthesis models for various languages with several motive applications is available based on domain requirements. However, recent developments in speech synthesis have primarily attributed to deep learning-based techniques that have improved a variety of application scenarios, including intelligent speech interaction, chatbots, and conversational artificial intelligence (AI). Text-to-speech systems are discussed in this survey article as an active topic of study that has achieved significant progress in the recent decade, particularly for Indian and non-Indian languages. Furthermore, the study also covers the lifecycle of text-to-speech systems as well as developed platforms in it. We performed an efficient search for published survey articles up to May 2021 in the web of science, PubMed, Scopus, EBSCO(Elton B. Stephens CO (company)) and Google Scholar for Text-to-speech Systems (TTS) in various languages based on different approaches. This survey article offers a study of the contributions made by various researchers in Indian and non-Indian language text-to-speech systems and the techniques used to implement it with associated challenges in designing TTS systems. The work also compared different language text-to-speech systems based on the quality metrics such as recognition rate, accuracy, TTS score, precision, recall, and F1-score. Further, the study summarizes existing ideas and their shortcomings, emphasizing the scope of future research in Indian and non-Indian languages TTS, which may assist beginners in designing robust TTS systems.

引用

页码：15171 / 15197

页数：26

共 154 条

[1] Alsharhan E(2019)Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions Inf Process Manag 56 343-353
[2] Ramsay A(2016)Data driven articulatory synthesis with deep neural networks Comput Speech Lang 36 260-273
[3] Aryal S(2020)Analysis of vowel production in Mandarin/Hindi/American-accented English for accent recognition systems Appl Acoust 162 107203-127
[4] Gutierrez-Osuna R(2017)Manipulation of the prosodic features of vocal tract length, nasality and articulatory precision using articulatory synthesis Comput Speech Lang 41 116-260
[5] Barkana BD(2006)Synthesis of voiced sounds using low-dimensional models of the vocal cords and time-varying subglottal pressure Mech Res Commun 33 250-88
[6] Patel A(2022)Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture Int J Speech Technol 25 79-108
[7] Birkholz P(2022)Multi-Label Extreme Learning Machine (MLELMs) for Bangla Regional Speech Recognition Appl Sci 12 5463-163
[8] Martin L(2015)Text to speech conversion using different speech synthesis Int J Sci Technol Res 4 104-43
[9] Xu Y(2014)Text–To–Speech Synthesis (TTS) Int J Res Inform Technol 2 154-30
[10] Scherbaum S(2021)Model architectures to extrapolate emotional expressions in DNN-based text-to-speech Speech Commun 126 35-41

← 1 2 3 4 5 6 7 8 9 10 →