A review of deep learning techniques for speech processing

被引:84
|
作者
Mehrish, Ambuj [1 ]
Majumder, Navonil [1 ]
Bharadwaj, Rishabh [1 ]
Mihalcea, Rada [2 ]
Poria, Soujanya [1 ]
机构
[1] Singapore Univ Technol & Design, ISTD, Singapore, Singapore
[2] Univ Michigan, Ann Arbor, MI USA
关键词
Deep learning; Speech processing; Transformers; Survey; Trends; TEXT-TO-SPEECH; CONVOLUTIONAL NEURAL-NETWORKS; UNSUPERVISED DOMAIN ADAPTATION; SPEAKER RECOGNITION; VOICE CONVERSION; WAVE-FORM; QUALITY PREDICTION; PLUS ALGORITHM; ENHANCEMENT; REPRESENTATION;
D O I
10.1016/j.inffus.2023.101869
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to -speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep -learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field.
引用
收藏
页数:55
相关论文
共 50 条
  • [31] Special issue on advances in deep learning based speech processing
    Zhang, Xiao-Lei
    Xie, Lei
    Fosler-Lussier, Eric
    Vincent, Emmanuel
    NEURAL NETWORKS, 2023, 158 : 328 - 330
  • [32] Optimal Feature Extraction and Selection Techniques for Speech Processing: A Review
    Chadha, Ankita N.
    Zaveri, Mukesh A.
    Sarvaiya, Jignesh N.
    2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 1669 - 1673
  • [33] Evaluation of Different Machine Learning and Deep Learning Techniques for Hate Speech Detection
    Shawkat, Nabil
    Saquer, Jamil
    Shatnawi, Hazim
    PROCEEDINGS OF THE 2024 ACM SOUTHEAST CONFERENCE, ACMSE 2024, 2024, : 253 - 258
  • [34] A review on speech processing using machine learning paradigm
    Bhangale, Kishor Barasu
    Mohanaprasad, K.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 367 - 388
  • [35] A review on speech processing using machine learning paradigm
    Kishor Barasu Bhangale
    K. Mohanaprasad
    International Journal of Speech Technology, 2021, 24 : 367 - 388
  • [36] A Review of Deep Learning Techniques for Glaucoma Detection
    Guergueb T.
    Akhloufi M.A.
    SN Computer Science, 4 (3)
  • [37] A Review on Deep Learning Techniques for IoT Data
    Lakshmanna, Kuruva
    Kaluri, Rajesh
    Gundluru, Nagaraja
    Alzamil, Zamil S.
    Rajput, Dharmendra Singh
    Khan, Arfat Ahmad
    Haq, Mohd Anul
    Alhussen, Ahmed
    ELECTRONICS, 2022, 11 (10)
  • [38] A review of deep learning techniques used in agriculture
    Attri, Ishana
    Awasthi, Lalit Kumar
    Sharma, Teek Parval
    Rathee, Priyanka
    ECOLOGICAL INFORMATICS, 2023, 77
  • [39] A Review on Deep Learning Techniques for Video Prediction
    Oprea, Sergiu
    Martinez-Gonzalez, Pablo
    Garcia-Garcia, Alberto
    Castro-Vargas, John Alejandro
    Orts-Escolano, Sergio
    Garcia-Rodriguez, Jose
    Argyros, Antonis
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 2806 - 2826
  • [40] Deep learning techniques and their applications: A short review
    Kumar, Vaibhav
    Garg, M. L.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (04): : 699 - 709