Eliciting meaningful units from speech

被引:3
作者
Kocharov, Daniil [1 ]
Kachkovskaia, Tatiana [1 ]
Skrelin, Pavel [1 ]
机构
[1] St Petersburg State Univ, St Petersburg, Russia
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
俄罗斯科学基金会;
关键词
phonetics; prosody; syntax; automatic intonational phrases detection; BOUNDARIES;
D O I
10.21437/Interspeech.2017-855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Elicitation of information structure from speech is a crucial step in automatic speech understanding. In terms of both production and perception, we consider intonational phrase to be the basic meaningful unit of information structure in speech. The current paper presents a method of detecting these units in speech by processing both the recorded speech and its textual representation. Using syntactic information, we split text into small groups of words closely connected with each other. Assuming that intonational phrases are built from these small groups. we use acoustic information to reveal their actual boundaries. The procedure was initially developed for processing Russian speech, and we have achieved the best published results for this language with F-1 equal to 0.91. We assume that it may be adapted for other languages that have some amount of read speech resources, including under-resourced languages. For comparison we have evaluated it on English material (Boston University Radio Speech Corpus). Our results, F-1 of 0.76, are comparable with the top systems designed for English.
引用
收藏
页码:2128 / 2132
页数:5
相关论文
共 31 条
[1]  
Andor D, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P2442
[2]  
[Anonymous], THESIS
[3]  
[Anonymous], 2016, P 10 INT C LANG RES
[4]  
[Anonymous], 2016, SOLARIX
[5]  
[Anonymous], 2009, AUTOMATIC DETECTION
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   How far, how long: On the temporal scope of prosodic boundary effects [J].
Byrd, Dani ;
Krivokapic, Jelena ;
Lee, Sungbok .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (03) :1589-1599
[8]   SEGMENTAL AND TEMPORAL ASPECTS OF UTTERANCE-FINAL LENGTHENING [J].
COOPER, WE ;
DANLY, M .
PHONETICA, 1981, 38 (1-3) :106-115
[9]  
Hirschberg J., 2001, Proceedings of Eurospeech, P1175
[10]  
Jeon J. H., 2009, P JOINT C 47 ANN M A, P540