Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引：18

作者：

Shahnawazuddin, Syed ^{[1
]}

Sinha, Rohit ^{[2
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India

[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India

来源：

IEEE SIGNAL PROCESSING LETTERS | 2017年 / 24卷 / 08期

关键词：

Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;

D O I：

10.1109/LSP.2017.2705085

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.

引用

页码：1128 / 1132

页数：5

共 30 条

[1] Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions
Bhardwaj, Vivek
Kukreja, Vinay
APPLIED ACOUSTICS, 2021, 177
[2] Pitch-Adaptive Front-end Features for Robust Children's ASR
Shahnawazuddin, S.
Dey, Abhishek
Sinha, Rohit
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3459 - 3463
[3] Robust children's speech recognition in zero resource condition
Shahnawazuddin, S.
Kumar, Avinash
Kumar, Vinit
Kumar, Saurabh
Ahmad, Waquar
APPLIED ACOUSTICS, 2022, 185
[4] Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation
Yoon, Yang-Soo
Li, Yongxin
Fu, Qian-Jie
JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2012, 55 (01): : 105 - 124
[5] Enhancing the magnitude spectrum of speech features for robust speech recognition
Hung, Jeih-weih
Fan, Hao-teng
Tu, Wen-hsiang
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
[6] Assessment of pitch-adaptive front-end signal processing for children's speech recognition
Sinha, Rohit
Shahnawazuddin, S.
COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 103 - 121
[7] Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
Kim, Chanwoo
Stern, Richard M.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1315 - 1329
[8] Histogram equalization of contextual statistics of speech features for robust speech recognition
Hsieh, Hsin-Ju
Chen, Berlin
Hung, Jeih-weih
MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) : 6769 - 6795
[9] Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust Speech Recognition
Fredes, Josue
Novoa, Jose
King, Simon
Stern, Richard M.
Becerra Yoma, Nestor
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 377 - 381
[10] Pitch-robust acoustic feature using single frequency filtering for children's KWS
Pattanayak, Biswaranjan
Pradhan, Gayadhar
PATTERN RECOGNITION LETTERS, 2021, 150 : 183 - 188

← 1 2 3 →