Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition

被引:18
|
作者
Shahnawazuddin, Syed [1 ]
Sinha, Rohit [2 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna 800005, Bihar, India
[2] Indian Inst Technol, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Automatic speech recognition (ASR); deep neural network (DNN); pitch-adaptive features; spectral smoothening; subspace Gaussian mixture model (SGMM); GAUSSIAN MIXTURE MODEL; REPRESENTATIONS; NOISE;
D O I
10.1109/LSP.2017.2705085
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this letter, the effectiveness of recently reported SMAC (Spectral Moment time-frequency distribution Augmented by low-order Cepstral) features has been evaluated for robust automatic speech recognition (ASR). The SMAC features consist of normalized first central spectral moments appended with low-order cepstral coefficients. These features have been designed for achieving robustness to both additive noise and the pitch variations. We have explored the SMAC features in severe pitch mismatch ASR task, i.e., decoding of children's speech on adults' speech trained ASR system. In those tasks, the SMAC features are still observed to be sensitive to pitch variations. Toward addressing the same, a simple spectral smoothening approach employing adaptive-cepstral truncation is explored prior to the computation of spectral moments. With the proposed modification, the SMAC features are noted to achieve enhanced pitch robustness without affecting their noise immunity. Furthermore, the effectiveness of the proposed features is explored in three dominant acoustic modeling paradigms and varying data conditions. In all the cases, the proposed features are observed to significantly outperform the existing ones.
引用
收藏
页码:1128 / 1132
页数:5
相关论文
共 30 条
  • [1] Effect of pitch enhancement in Punjabi children's speech recognition system under disparate acoustic conditions
    Bhardwaj, Vivek
    Kukreja, Vinay
    APPLIED ACOUSTICS, 2021, 177
  • [2] Pitch-Adaptive Front-end Features for Robust Children's ASR
    Shahnawazuddin, S.
    Dey, Abhishek
    Sinha, Rohit
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3459 - 3463
  • [3] Robust children's speech recognition in zero resource condition
    Shahnawazuddin, S.
    Kumar, Avinash
    Kumar, Vinit
    Kumar, Saurabh
    Ahmad, Waquar
    APPLIED ACOUSTICS, 2022, 185
  • [4] Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation
    Yoon, Yang-Soo
    Li, Yongxin
    Fu, Qian-Jie
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2012, 55 (01): : 105 - 124
  • [5] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Hung, Jeih-weih
    Fan, Hao-teng
    Tu, Wen-hsiang
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [6] Assessment of pitch-adaptive front-end signal processing for children's speech recognition
    Sinha, Rohit
    Shahnawazuddin, S.
    COMPUTER SPEECH AND LANGUAGE, 2018, 48 : 103 - 121
  • [7] Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition
    Kim, Chanwoo
    Stern, Richard M.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1315 - 1329
  • [8] Histogram equalization of contextual statistics of speech features for robust speech recognition
    Hsieh, Hsin-Ju
    Chen, Berlin
    Hung, Jeih-weih
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) : 6769 - 6795
  • [9] Locally Normalized Filter Banks Applied to Deep Neural-Network-Based Robust Speech Recognition
    Fredes, Josue
    Novoa, Jose
    King, Simon
    Stern, Richard M.
    Becerra Yoma, Nestor
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) : 377 - 381
  • [10] Pitch-robust acoustic feature using single frequency filtering for children's KWS
    Pattanayak, Biswaranjan
    Pradhan, Gayadhar
    PATTERN RECOGNITION LETTERS, 2021, 150 : 183 - 188