Adaptive framing based similarity measurement between time warped speech signals using Kalman filter

被引:2
作者
Khan W. [1 ]
Crockett K. [1 ]
Bilal M. [2 ]
机构
[1] School of Mathematics, Computing and Digital Technology, Manchester Metropolitan University, Manchester
[2] Institute of the Environment and Sustainability, University of California, Los Angeles
关键词
Adaptive speech segmentation; Dynamic time warping; Kalman filter; Speech processing; Spoken term detection;
D O I
10.1007/s10772-018-9511-z
中图分类号
学科分类号
摘要
Similarity measurement between speech signals aims at calculating the degree of similarity using acoustic features that has been receiving much interest due to the processing of large volume of multimedia information. However, dynamic properties of speech signals such as varying silence segments and time warping factor make it more challenging to measure the similarity between speech signals. This manuscript entails further extension of our research towards the adaptive framing based similarity measurement between speech signals using a Kalman filter. Silence removal is enhanced by integrating multiple features for voiced and unvoiced speech segments detection. The adaptive frame size measurement is improved by using the acceleration/deceleration phenomenon of object linear motion. A dominate feature set is used to represent the speech signals along with the pre-calculated model parameters that are set by the offline tuning of a Kalman filter. Performance is evaluated using additional datasets to evaluate the impact of the proposed model and silence removal approach on the time warped speech similarity measurement. Detailed statistical results are achieved indicating the overall accuracy improvement from 91 to 98% that proves the superiority of the extended approach on our previous research work towards the time warped continuous speech similarity measurement. © 2018, The Author(s).
引用
收藏
页码:343 / 354
页数:11
相关论文
共 45 条
[1]  
Abad A., Rodriguez-Fuentes L.J., Penagarikano M., Varona A., Diez M., Bordel G., On the calibration and fusion of heterogeneous spoken term detection systems. Conference of the International Speech Communication Association, Interspeech, France, pp. 25-29, (2013)
[2]  
Akila A., Chandra E., Slope finder—A distance measure for DTW based isolated word speech recognition, International Journal of Engineering and Computer Science, 2, 12, pp. 3411-3417, (2013)
[3]  
Proceedings of MediaEval, (2013)
[4]  
Proceedings of MediaEval, (2014)
[5]  
Proceedings of Interspeech, (2010)
[6]  
Cheng-Tao C., Chun-an C., Lin-Shan L., Unsupervised spoken term detection with spoken queries by multi-level acoustic patterns with varying model granularity. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4-9, pp. 7814-7818, (2014)
[7]  
Chotirat R., Eamonn K., Three myths about dynamic time warping data mining, In The Proceedings of SIAM International Conference on Data Mining, pp. 506-510, (2005)
[8]  
Chun-An C., Lin-Shan L., Unsupervised hidden markov modeling of spoken queries for spoken term detection without speech recognition, In Proceedings of Interspeech, pp. 2141-2144, (2011)
[9]  
Chun-An C., Lin-Shan L., Model-based unsupervised spoken term detection with spoken queries, IEEE Transactions on Audio, Speech, and Language Processing, 21, 7, pp. 1330-1342, (2013)
[10]  
Dave N., Feature extraction methods LPC, PLP and MFCC in speech recognition, International Journal for Advance Research in Engineering and Technology, 1, 6, pp. 1-4, (2013)