A Discrete Wavelet Transform-Based Voice Activity Detection and Noise Classification With Sub-Band Selection

被引:1
|
作者
Abdullah, Salinna [1 ]
Zamani, Majid [1 ]
Demosthenous, Andreas [1 ]
机构
[1] UCL, Dept Elect & Elect Engn, Torrington Pl, London WC1E 7JE, England
基金
英国工程与自然科学研究理事会;
关键词
Discrete wavelet transform; mel-frequency cepstral coefficients; multilayer perceptron; noise classification; sub-band selection; voice activity detection; SPEECH ENHANCEMENT;
D O I
10.1109/ISCAS51556.2021.9401647
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A real-time discrete wavelet transform-based adaptive voice activity detector and sub-band selection for feature extraction are proposed for noise classification, which can be used in a speech processing pipeline. The voice activity detection and sub-band selection rely on wavelet energy features and the feature extraction process involves the extraction of mel-frequency cepstral coefficients from selected wavelet sub-bands and mean absolute values of all sub-bands. The method combined with a feedforward neural network with two hidden layers could be added to speech enhancement systems and deployed in hearing devices such as cochlear implants. In comparison to the conventional short-time Fourier transform-based technique, it has higher F-1 scores and classification accuracies (with a mean of 0.916 and 90.1%, respectively) across five different noise types (babble, factory, pink, Volvo (car) and white noise), a significantly smaller feature set with 21 features, reduced memory requirement, faster training convergence and about half the computational cost.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Wavelet sub-band features for voice disorder detection and classification
    Gidaye, Girish
    Nirmal, Jagannath
    Ezzine, Kadria
    Frikha, Mondher
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (39-40) : 28499 - 28523
  • [2] Wavelet sub-band features for voice disorder detection and classification
    Girish Gidaye
    Jagannath Nirmal
    Kadria Ezzine
    Mondher Frikha
    Multimedia Tools and Applications, 2020, 79 : 28499 - 28523
  • [3] Enhanced discrete wavelet packet sub-band frequency edge detection using Hilbert transform
    Dibal, P. Y.
    Onwuka, E. N.
    Agajo, J.
    Alenoghena, C. O.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2018, 16 (01)
  • [4] ROBUST VOICE ACTIVITY DETECTION BASED ON PITCH AND SUB-BAND ENERGY
    Zhang, Zhihao
    Lin, Jinlong
    SIGMAP 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2009, : 44 - 48
  • [5] Morse wavelet transform-based features for voice liveness detection
    Gupta, Priyanka
    Patil, Hemant A.
    COMPUTER SPEECH AND LANGUAGE, 2024, 84
  • [6] Sub-band discrete cosine transform-based greyscale image watermarking using general regression neural network
    Mehta, Rajesh
    Rajpal, Navin
    Vishwakarma, Virendra P.
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2015, 8 (06) : 380 - 389
  • [7] Sub-band Feature Statistics Compensation Techniques Based on Discrete Wavelet Transform for Robust Speech Recognition
    Fan, Hao-Teng
    Hung, Jeih-weih
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 586 - 589
  • [8] Towards Discrete Wavelet Transform-based Human Activity Recognition
    Khare, Manish
    Jeon, Moongu
    SECOND INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2017, 10443
  • [9] Voice activity detection based on noise classification and dictionary selection
    Xie, Yining
    Huang, Jinjie
    Zhao, Jing
    He, Yongjun
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2016, 44 (12): : 121 - 126
  • [10] Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability
    Liu, Bin
    Tao, Jianhua
    Mo, Fuyuan
    Li, Ya
    Wen, Zhengqi
    Liu, Shanfeng
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 531 - +