A Modified Speaking Rate Estimation Based on Frame-Level LSTM

被引:0
作者
Xiao, Yanhong [1 ]
Du, Shixuan [1 ]
Xie, Xiang [1 ]
Wang, Jing [1 ]
Zhan, Qingran [1 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2018年
关键词
frame-level LSTM; speaking rate estimation; segmentation; SPEECH;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speaking rate has various applications in many domains such as speech recognition, speaker verification, emotion recognition, etc. It conveys long-term information in speech and changes over time which can be seen as a kind of time sequence. This paper proposes a frame-level LSTM speaking rate estimation method. Instead of taking the whole utterance as a sequence, the frame-level LSTM exploits the sequence information in each segment and brings a more precise segmented speaking rate estimation. We also evaluate the influence of fixed-length segmentation and voice activity detection(vad) segmentation on speaking rate estimation. Results show that the proposed frame-level LSTM method yields a high correlation between the estimated speaking rate and the ground truth. It achieves a relative improvement of 13.0% compared to the state of the art statistical learning method and 16.3% over the support vector regression(SVR) evaluated on the same TIMIT corpus.
引用
收藏
页码:600 / 603
页数:4
相关论文
共 17 条
[1]  
Abdelwahab M., 2015, SPOK LANG TECHN WORK, P472
[2]  
[Anonymous], 2004, TUTORIAL SUPPORT VEC
[3]  
[Anonymous], 2010, LONG SHORT TERM MEMO
[4]  
Falthauser R., 2000, ON LINE SPEAKING RAT, P101
[5]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[6]  
HAIGH JA, 1993, TENCON'93: 1993 IEEE REGION 10 CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND POWER ENGINEERING, VOL 3, P321, DOI 10.1109/TENCON.1993.327987
[7]  
Howitt A.W., 2000, Automatic syllable detection for vowel landmarks
[8]  
Itzinger H. R. P., 1998, LOCAL SPEECH RATE CO
[9]  
Jiao YS, 2016, INT CONF ACOUST SPEE, P5245, DOI 10.1109/ICASSP.2016.7472678
[10]   AUTOMATIC SEGMENTATION OF SPEECH INTO SYLLABIC UNITS [J].
MERMELSTEIN, P .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 (04) :880-883