A review on speech processing using machine learning paradigm

被引：29

作者：

Bhangale, Kishor Barasu ^{[1
]}

Mohanaprasad, K. ^{[1
]}

机构：

[1] VIT Univ, Sch Elect Engn SENSE, Chennai 600127, Tamil Nadu, India

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2021年 / 24卷 / 02期

关键词：

Speech processing; Speech recognition; Machine learning; Speech feature extraction; Speech classification; Speech emotion recognition; INDEPENDENT COMPONENT ANALYSIS; SUPPORT VECTOR MACHINES; SPEAKER RECOGNITION; CLASSIFICATION; HMM; FEATURES; SHIMMER; MODELS; JITTER; ADAPTATION;

D O I：

10.1007/s10772-021-09808-0

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech processing plays a crucial role in many signal processing applications, while the last decade has bought gigantic evolution based on machine learning prototype. Speech processing has a close relationship with computer linguistics, human-machine interaction, natural language processing, and psycholinguistics. This review article majorly discusses the feature extraction techniques and machine learning classifiers employed in speech processing and recognition activities. The performance of several machine learning techniques is validated for speech emotion recognition application on Berlin EmoDB database. Further, it gives the broad application areas and challenges in machine learning for speech processing.

引用

页码：367 / 388

页数：22

共 155 条

[1]

Abbosovna, 2020, ASIAN J MULTIDIMENSI, V9, P165, DOI [10.5958/2278-4853.2020.00195.0, DOI 10.5958/2278-4853.2020.00195.0]

[2] Effectiveness of Voice Quality Features in Detecting Depression [J].

Afshan, Amber ;

Guo, Jinxi ;

Park, Soo Jin ;

Ravi, Vijay ;

Flint, Jonathan ;

Alwan, Abeer .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :1676-1680

[3] Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers [J].

Akcay, Mehmet Berkehan ;

Oguz, Kaya .

SPEECH COMMUNICATION, 2020, 116 :56-76

[4]

Alhargan A, 2017, PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2017, P479, DOI 10.1145/3136755.3137016

[5] Automatic speech recognition of Urdu words using linear discriminant analysis [J].

Ali, Hazrat ;

Ahmad, Nasir ;

Zhou, Xianwei .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 28 (05) :2369-2375

[6] Automated Detection of Parkinson's Disease Based on Multiple Types of Sustained Phonations Using Linear Discriminant Analysis and Genetically Optimized Neural Network [J].

Ali, Liaqat ;

Zhu, Ce ;

Zhang, Zhonghao ;

Liu, Yipeng .

IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2019, 7

[7]

ALVES SF, 2014, 5 ISSNIP IEEE BIOS B

[8]

Amberkar A., 2018, 2018 INT C CURR TREN, P1, DOI [10.1109/ICCTCT.2018.8551185, DOI 10.1109/ICCTCT.2018.8551185]

[9]

Anjana JS, 2018, 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET)

[10]

[Anonymous], 2015, PROC INT C SPEECH TE

← 1 2 3 4 5 6 7 8 9 10 →