Ensemble deep learning in speech signal tasks: A review

被引：18

作者：

Tanveer, M. ^{[1
]}

Rastogi, Aryan ^{[2
]}

Paliwal, Vardhan ^{[2
]}

Ganaie, M. A. ^{[3
]}

Malik, A. K. ^{[1
]}

Del Ser, Javier ^{[4
,5
]}

Lin, Chin-Teng ^{[6
]}

机构：

[1] Indian Inst Technol Indore, Dept Math, Indore, Madhya Pradesh, India

[2] Indian Inst Technol Indore, Dept Elect Engn, Indore, Madhya Pradesh, India

[3] Univ Michigan, Dept Robot, Ann Arbor, MI USA

[4] TECNALIA, Basque Res & Technol Alliance BRTA, Derio, Spain

[5] Univ Basque Country UPV EHU, Bilbao, Spain

[6] Univ Technol Sydney, Human Centr AI Ctr, Sch Comp Sci, Sydney, Australia

来源：

NEUROCOMPUTING | 2023年 / 550卷

关键词：

Deep learning; Ensemble deep learning; Speech signal; Speech recognition; Speech enhancement; EMOTION RECOGNITION; NEURAL-NETWORKS; GENDER RECOGNITION; ENHANCEMENT; SEPARATION; CLASSIFICATION; INFORMATION; ALGORITHM; FUTURE; AGE;

D O I：

10.1016/j.neucom.2023.126436

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine learning methods are extensively used for processing and analysing speech signals by virtue of their performance gains over multiple domains. Deep learning and ensemble learning are the two most commonly used techniques, which results in benchmark performance across different downstream tasks. Ensemble deep learning is a recent development which combines these two techniques to result in a robust architecture having substantial performance gains, as well as better generalization performance over the individual techniques. In this paper, we extensively review the use of ensemble deep learning methods for different speech signal related tasks, ranging from general objectives such as automatic speech recognition and voice activity detection, to more specific areas such as biomedical applications involving the detection of pathological speech or music genre detection. We provide a discussion on the use of different ensemble strategies such as bagging, boosting and stacking in the context of speech signals, and identify the various salient features and advantages from a broader perspective when coupled with deep learning architectures. The main objective of this study is to comprehensively evaluate existing works in the area of ensemble deep learning, and highlight the future directions that may be explored to further develop it as a tool for several speech related tasks. To the best of our knowledge, this is the first review study which primarily focuses on ensemble deep learning for speech applications. This study aims to serve as a valuable resource for researchers in academia and in industry working with speech signals, supporting advanced novel applications of ensemble deep learning models towards solving challenges in existing speech processing systems.& COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：18

共 161 条

[1] Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models [J].

Abbaschian, Babak Joze ;

Sierra-Sosa, Daniel ;

Elmaghraby, Adel .

SENSORS, 2021, 21 (04) :1-27

[2] Arabic Speech Recognition with Deep Learning: A Review [J].

Algihab, Wajdan ;

Alawwad, Noura ;

Aldawish, Anfal ;

AlHumoud, Sarah .

SOCIAL COMPUTING AND SOCIAL MEDIA: DESIGN, HUMAN BEHAVIOR AND ANALYTICS, SCSM 2019, PT I, 2019, 11578 :15-31

[3]

[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390294

[4]

[Anonymous], 2005, INTERSPEECH

[5]

[Anonymous], 2001, Journal of Applied Science and Engineering, DOI 10.6180/jase.2001.4.3.05

[6] A review of multi-objective deep learning speech denoising methods [J].

Azarang, Arian ;

Kehtarnavaz, Nasser .

SPEECH COMMUNICATION, 2020, 122 :1-10

[7] Speaker recognition based on deep learning: An overview [J].

Bai, Zhongxin ;

Zhang, Xiao-Lei .

NEURAL NETWORKS, 2021, 140 :65-99

[8]

Basu S, 2017, PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), P109, DOI 10.1109/ICICCT.2017.7975169

[9]

Bengio Y., 2006, Advances in Neural Information Processing Systems, P153

[10] Survey of Deep Learning Paradigms for Speech Processing [J].

Bhangale, Kishor Barasu ;

Kothandaraman, Mohanaprasad .

WIRELESS PERSONAL COMMUNICATIONS, 2022, 125 (02) :1913-1949

← 1 2 3 4 5 6 7 8 9 10 →