Deep Learning in Audio Classification

被引:3
作者
Wang, Yaqin [1 ]
Wei-Kocsis, Jin [1 ]
Springer, John A. [1 ]
Matson, Eric T. [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2022 | 2022年 / 1665卷
关键词
Audio classification; Machine learning; Deep learning; Deep reinforcement learning; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.1007/978-3-031-16302-9_5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Audio processing technology is happening everywhere in our life. We ask our car to make a call for us while driving, or we let Alexa turn off the light for us when we don't want to get out of bed before sleep. In all of these audio-based applications and research, it is AI and ML that makes the computer or the smart phone understand us via our voice [1]. As an important part of artificial intelligence (AI), especially machine learning (ML), which has had great influences in many areas of AI and ML-based research and applications. This paper focuses on deep learning structures and applications for audio classification. We conduct a detailed review of literature in audio-based DL and DRL approaches and applications. We also discuss the limitation and possible future works for audio-based DL approach.
引用
收藏
页码:64 / 77
页数:14
相关论文
共 65 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]   A Review of Deep Learning Based Methods for Acoustic Scene Classification [J].
Abesser, Jakob .
APPLIED SCIENCES-BASEL, 2020, 10 (06)
[3]   Personalization of Hearing Aid Compression by Human-in-the-Loop Deep Reinforcement Learning [J].
Alamdari, Nasim ;
Lobarinas, Edward ;
Kehtarnavaz, Nasser .
IEEE ACCESS, 2020, 8 :203503-203515
[4]  
[Anonymous], 2009, The elements of statistical learning, DOI DOI 10.1007/978-0-387-84858-7
[5]  
Bishop C., 2006, Pattern Recognition and Machine Learning
[6]  
Lipton ZC, 2015, Arxiv, DOI [arXiv:1506.00019, 10.48550/arXiv.1506.00019, DOI 10.48550/ARXIV.1506.00019]
[7]  
Caruana R., 2006, P 23 INT C MACH LEAR, P25, DOI [10.1145/1143844.1143865, DOI 10.1145/1143844.1143865]
[8]   Environmental sound classification with dilated convolutions [J].
Chen, Yan ;
Guo, Qian ;
Liang, Xinyan ;
Wang, Jiang ;
Qian, Yuhua .
APPLIED ACOUSTICS, 2019, 148 :123-132
[9]  
Cho KYHY, 2014, Arxiv, DOI [arXiv:1406.1078, DOI 10.48550/ARXIV.1406.1078]
[10]   Semi-supervised Training for Sequence-to-Sequence Speech Recognition Using Reinforcement Learning [J].
Chung, Hoon ;
Jeon, Hyeong-Bae ;
Park, Jeon Gue .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,