A Modified Speaking Rate Estimation Based on Frame-Level LSTM

被引：0

作者：

Xiao, Yanhong ^{[1
]}

Du, Shixuan ^{[1
]}

Xie, Xiang ^{[1
]}

Wang, Jing ^{[1
]}

Zhan, Qingran ^{[1
]}

机构：

[1] Beijing Inst Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) | 2018年

关键词：

frame-level LSTM; speaking rate estimation; segmentation; SPEECH;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speaking rate has various applications in many domains such as speech recognition, speaker verification, emotion recognition, etc. It conveys long-term information in speech and changes over time which can be seen as a kind of time sequence. This paper proposes a frame-level LSTM speaking rate estimation method. Instead of taking the whole utterance as a sequence, the frame-level LSTM exploits the sequence information in each segment and brings a more precise segmented speaking rate estimation. We also evaluate the influence of fixed-length segmentation and voice activity detection(vad) segmentation on speaking rate estimation. Results show that the proposed frame-level LSTM method yields a high correlation between the estimated speaking rate and the ground truth. It achieves a relative improvement of 13.0% compared to the state of the art statistical learning method and 16.3% over the support vector regression(SVR) evaluated on the same TIMIT corpus.

引用

页码：600 / 603

页数：4

共 17 条

[1]

Abdelwahab M., 2015, SPOK LANG TECHN WORK, P472

[2]

[Anonymous], 2004, TUTORIAL SUPPORT VEC

[3]

[Anonymous], 2010, LONG SHORT TERM MEMO

[4]

Falthauser R., 2000, ON LINE SPEAKING RAT, P101

[5]

Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947

[6]

HAIGH JA, 1993, TENCON'93: 1993 IEEE REGION 10 CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND POWER ENGINEERING, VOL 3, P321, DOI 10.1109/TENCON.1993.327987

[7]

Howitt A.W., 2000, Automatic syllable detection for vowel landmarks

[8]

Itzinger H. R. P., 1998, LOCAL SPEECH RATE CO

[9]

Jiao YS, 2016, INT CONF ACOUST SPEE, P5245, DOI 10.1109/ICASSP.2016.7472678

[10] AUTOMATIC SEGMENTATION OF SPEECH INTO SYLLABIC UNITS [J].

MERMELSTEIN, P .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 (04) :880-883

← 1 2 →