Trends in audio signal feature extraction methods

被引:223
作者
Sharma, Garima [1 ]
Umapathy, Kartikeyan [1 ]
Krishnan, Sridhar [1 ]
机构
[1] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
关键词
Audio; Speech; Signal; Feature extraction; Survey; Machine learning; SPECTRAL-ANALYSIS; SPEECH ANALYSIS; CLASSIFICATION; TIME; RECOGNITION; MUSIC; BINARY; DISCRIMINATION; PREDICTION; RETRIEVAL;
D O I
10.1016/j.apacoust.2019.107020
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audio signal processing algorithms generally involves analysis of signal, extracting its properties, predicting its behaviour, recognizing if any pattern is present in the signal, and how a particular signal is correlated to another similar signals. Audio signal includes music, speech and environmental sounds. Over the last few decades, audio signal processing has grown significantly in terms of signal analysis and classification. And it has been proven that solutions of many existing issues can be solved by integrating the modern machine learning (ML) algorithms with the audio signal processing techniques. The performance of any ML algorithm depends on the features on which the training and testing is done. Hence feature extraction is one of the most vital part of a machine learning process. The aim of this study is to summarize the literature of the audio signal processing specially focusing on the feature extraction techniques. In this survey the temporal domain, frequency domain, cepstral domain, wavelet domain and time-frequency domain features are discussed in detail. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:21
相关论文
共 143 条
  • [71] Lansford KL, 2014, J SPEECH LANG HEARIN
  • [72] Li GH, 2000, 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, P885, DOI 10.1109/ICME.2000.871501
  • [73] Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection
    Li, Yanxiong
    Zhang, Xue
    Jin, Hai
    Li, Xianku
    Wang, Qin
    He, Qianhua
    Huang, Qian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (01) : 897 - 916
  • [74] Sensitive Fluorescent Sensor for Recognition of HIV-1 dsDNA by Using Glucose Oxidase and Triplex DNA
    Li, Yubin
    Liu, Sheng
    Ling, Liansheng
    [J]. JOURNAL OF ANALYTICAL METHODS IN CHEMISTRY, 2018, 2018
  • [75] Liss JM, 2010, J SPEECH LANG HEAR R
  • [76] Liu YX, 2009, INT CONF ACOUST SPEE, P57, DOI 10.1109/ICASSP.2009.4959519
  • [77] Liu Z, 1997, 1997 IEEE FIRST WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, P343
  • [78] Loizou P. C., 2007, Speech Enhancement: Theory and Practice
  • [79] Automatic mood detection and tracking of music audio signals
    Lu, L
    Liu, D
    Zhang, HJ
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 5 - 18
  • [80] Fault Diagnosis of Motor Bearing by Analyzing a Video Clip
    Lu, Siliang
    Wang, Xiaoxian
    Liu, Fang
    He, Qingbo
    Liu, Yongbin
    Zhao, Jiwen
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016