Evaluating Intention Communication by TTS using Explicit Definitions of Illocutionary Act Performance

被引:1
作者
Hojo, Nobukatsu [1 ]
Miyazaki, Noboru [1 ]
机构
[1] NTT Commun Sci Labs, Atsugi, Kanagawa, Japan
来源
INTERSPEECH 2019 | 2019年
关键词
TTS evaluation; spoken dialogue systems; speech synthesis; dialogue act; felicity conditions; SPEECH; CUES;
D O I
10.21437/Interspeech.2019-2188
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Text-to-speech (TTS) synthesis systems have been evaluated with respect to attributes such as quality, naturalness and intelligibility. However, an evaluation protocol with respect to communication of intentions has not yet been established. Evaluating this sometimes produce unreliable results because participants can misinterpret definitions of intentions. This misinterpretation is caused by the colloquial and implicit description of intentions. To address this problem, this work explicitly defines each intention following theoretical definitions, "felicity conditions", in speech-act theory. We define the communication of each intention with one to four necessary and sufficient conditions to be satisfied. In listening tests, participants rated whether each condition was satisfied or not. We compared the proposed protocol with the conventional baseline using four different voice conditions; neutral TTS, conversational TTS w/ and w/o intention inputs, and recorded speech. The experimental results with 10 participants showed that the proposed protocol produced smaller within-group variation and larger between-group variation. These results indicate that the proposed protocol can be used to evaluate intention communication with higher inter-rater reliability and sensitivity.
引用
收藏
页码:1536 / 1540
页数:5
相关论文
共 28 条
[1]  
[Anonymous], 1980, Speech Act Theory and Pragmatics
[2]  
[Anonymous], 2010, INT C COMP LING
[3]  
[Anonymous], 2012, Meaning and relevance, DOI DOI 10.1017/CBO9781139028370
[4]  
[Anonymous], 2011, P BLIZZ CHALL WORKSH
[5]   Fostering social agency in multimedia learning: Examining the impact of an animated agent's voice [J].
Atkinson, RK ;
Mayer, RE ;
Merrill, MM .
CONTEMPORARY EDUCATIONAL PSYCHOLOGY, 2005, 30 (01) :117-139
[6]  
Austin J.L., 1975, How to Do Things with Words
[7]  
Chiba Y, 2018, 19TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2018), P371
[8]  
Grice Paul., 1991, STUDIES WAY WORDS
[9]   Prosody conveys speaker's intentions: Acoustic cues for speech act perception [J].
Hellbernd, Nele ;
Sammler, Daniela .
JOURNAL OF MEMORY AND LANGUAGE, 2016, 88 :70-86
[10]  
Hennig S., 2012, 2012 RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication, P589, DOI 10.1109/ROMAN.2012.6343815