Pitch-robust acoustic feature using single frequency filtering for children's KWS

被引:7
|
作者
Pattanayak, Biswaranjan [1 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India
关键词
KWS; Pitch; Speaking rate; Single frequency filter; Pitch robust feature; EPOCH EXTRACTION; SPEECH RECOGNITION; SYSTEM; NOISE;
D O I
10.1016/j.patrec.2021.07.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pitch and speaking rate are the two significant factors that cause the acoustic mismatch in children's keyword spotting (KWS) system. This paper proposes a pitch-robust acoustic feature based on single fre-quency filtering (SFF) for the development of children's KWS system. In the proposed approach using SFF, the amplitude envelopes (AEs) of the speech data are computed at D-number of selected frequencies separated in Mel scale. The AEs are then averaged over short-time overlapping analysis frames and log-arithmically compressed to represent the D-dimensional feature set per analysis frame, here termed as Mel spaced single frequency average log envelope (MSSF-ALE). By using the proposed MSSF-ALE feature, improved performance is observed for the deep neural network-hidden Markov model-based KWS sys-tem over the standard Mel-frequency cepstral coefficients (MFCC) and MFCC extracted from the smoothed spectra. The relative improvement of 104.44% in term-weighted value (T W V) for children's KWS is ob-served over the MFCC by using MSSF-ALE. The performance of the KWS system is then evaluated with data-augmented training through explicit speaking rate modification of the training data set. The MSSF-ALE provides a relative improvement of 195.94% in T W V over MFCC with the data-augmented training. The MSSF-ALE also results in improved performance than the explored features in noisy test cases. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:183 / 188
页数:6
相关论文
共 23 条
  • [1] Significance of single frequency filter for the development of children's KWS system
    Pattanayak, Biswaranjan
    Pradhan, Gayadhar
    INTERSPEECH 2022, 2022, : 3183 - 3187
  • [2] Pitch and noise normalized acoustic feature for children's ASR
    Yadav, Ishwar Chandra
    Pradhan, Gayadhar
    DIGITAL SIGNAL PROCESSING, 2021, 109
  • [3] Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition
    Shahnawazuddin, Syed
    Sinha, Rohit
    Pradhan, Gayadhar
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1128 - 1132
  • [4] Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition
    Gupta, Shruti
    Fahad, Md. Shah
    Deepak, Akshay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 23347 - 23365
  • [5] Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition
    Choi, Bo Kyeong
    Ban, Sung Min
    Kim, Hyung Soon
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 316 - 320
  • [6] A Study on Speaker Identification Approach by Feature Matching Algorithm using Pitch and Mel Frequency Cepstral Coefficients
    Prasetio, Barlian Henryranu
    Sakurai, Keiko
    Tamura, Hiroki
    Tanno, Koichi
    ICAROB 2019: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS, 2019, : 475 - 478
  • [7] Epoch extraction from emotional speech using single frequency filtering approach
    Kadiri, Sudarsana Reddy
    Yegnanarayana, B.
    SPEECH COMMUNICATION, 2017, 86 : 52 - 63
  • [8] Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions
    Bhardwaj, Vivek
    Kukreja, Vinay
    APPLIED ACOUSTICS, 2021, 177
  • [9] Adaptive feature truncation to address acoustic mismatch in automatic recognition of children's speech
    Ghai, Shweta
    Sinha, Rohit
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5 (05)
  • [10] Detection of Glottal Closure Instants in Degraded Speech using Single Frequency Filtering Analysis
    Aneeja, G.
    Kadiri, Sudarsana Reddy
    Yegnanarayana, B.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2300 - 2304