Pitch-robust acoustic feature using single frequency filtering for children's KWS

被引：7

作者：

Pattanayak, Biswaranjan ^{[1
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India

来源：

PATTERN RECOGNITION LETTERS | 2021年 / 150卷

关键词：

KWS; Pitch; Speaking rate; Single frequency filter; Pitch robust feature; EPOCH EXTRACTION; SPEECH RECOGNITION; SYSTEM; NOISE;

D O I：

10.1016/j.patrec.2021.07.015

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The pitch and speaking rate are the two significant factors that cause the acoustic mismatch in children's keyword spotting (KWS) system. This paper proposes a pitch-robust acoustic feature based on single fre-quency filtering (SFF) for the development of children's KWS system. In the proposed approach using SFF, the amplitude envelopes (AEs) of the speech data are computed at D-number of selected frequencies separated in Mel scale. The AEs are then averaged over short-time overlapping analysis frames and log-arithmically compressed to represent the D-dimensional feature set per analysis frame, here termed as Mel spaced single frequency average log envelope (MSSF-ALE). By using the proposed MSSF-ALE feature, improved performance is observed for the deep neural network-hidden Markov model-based KWS sys-tem over the standard Mel-frequency cepstral coefficients (MFCC) and MFCC extracted from the smoothed spectra. The relative improvement of 104.44% in term-weighted value (T W V) for children's KWS is ob-served over the MFCC by using MSSF-ALE. The performance of the KWS system is then evaluated with data-augmented training through explicit speaking rate modification of the training data set. The MSSF-ALE provides a relative improvement of 195.94% in T W V over MFCC with the data-augmented training. The MSSF-ALE also results in improved performance than the explored features in noisy test cases. (c) 2021 Elsevier B.V. All rights reserved.

引用

页码：183 / 188

页数：6

共 23 条

[1] Significance of single frequency filter for the development of children's KWS system
Pattanayak, Biswaranjan
Pradhan, Gayadhar
INTERSPEECH 2022, 2022, : 3183 - 3187
[2] Pitch and noise normalized acoustic feature for children's ASR
Yadav, Ishwar Chandra
Pradhan, Gayadhar
DIGITAL SIGNAL PROCESSING, 2021, 109
[3] Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition
Shahnawazuddin, Syed
Sinha, Rohit
Pradhan, Gayadhar
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1128 - 1132
[4] Pitch-synchronous single frequency filtering spectrogram for speech emotion recognition
Gupta, Shruti
Fahad, Md. Shah
Deepak, Akshay
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (31-32) : 23347 - 23365
[5] Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition
Choi, Bo Kyeong
Ban, Sung Min
Kim, Hyung Soon
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2015, 34 (04): : 316 - 320
[6] A Study on Speaker Identification Approach by Feature Matching Algorithm using Pitch and Mel Frequency Cepstral Coefficients
Prasetio, Barlian Henryranu
Sakurai, Keiko
Tamura, Hiroki
Tanno, Koichi
ICAROB 2019: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS, 2019, : 475 - 478
[7] Epoch extraction from emotional speech using single frequency filtering approach
Kadiri, Sudarsana Reddy
Yegnanarayana, B.
SPEECH COMMUNICATION, 2017, 86 : 52 - 63
[8] Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions
Bhardwaj, Vivek
Kukreja, Vinay
APPLIED ACOUSTICS, 2021, 177
[9] Adaptive feature truncation to address acoustic mismatch in automatic recognition of children's speech
Ghai, Shweta
Sinha, Rohit
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5 (05)
[10] Detection of Glottal Closure Instants in Degraded Speech using Single Frequency Filtering Analysis
Aneeja, G.
Kadiri, Sudarsana Reddy
Yegnanarayana, B.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2300 - 2304

← 1 2 3 →