Detecting Topic-Oriented Speaker Stance in Conversational Speech

被引:3
作者
Lai, Catherine [1 ]
Alex, Beatrice [1 ,2 ]
Moore, Johanna D. [1 ]
Tian, Leimin [3 ]
Hori, Tatsuro [4 ]
Francesca, Gianpiero [5 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Univ Edinburgh, Sch Literatures Languages & Cultures, Edinburgh Futures Inst, Edinburgh, Midlothian, Scotland
[3] Monash Univ, Comp Human Interact & Creat, Melbourne, Vic, Australia
[4] Toyota Motor Co Ltd, Tokyo, Japan
[5] Toyota Motor Europe, Brussels, Belgium
来源
INTERSPEECH 2019 | 2019年
关键词
spoken language understanding; affective computing; stance; computational paralinguistics; spoken dialogue;
D O I
10.21437/Interspeech.2019-2632
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Being able to detect topics and speaker stances in conversations is a key requirement for developing spoken language understanding systems that are personalized and adaptive. In this work, we explore how topic-oriented speaker stance is expressed in conversational speech. To do this, we present a new set of topic and stance annotations of the CallHome corpus of spontaneous dialogues. Specifically, we focus on six stances-positivity, certainty, surprise, amusement, interest, and comfort-which are useful for characterizing important aspects of a conversation, such as whether a conversation is going well or not. Based on this, we investigate the use of neural network models for automatically detecting speaker stance from speech in multi-turn, multi-speaker contexts. In particular, we examine how performance changes depending on how input feature representations are constructed and how this is related to dialogue structure. Our experiments show that incorporating both lexical and acoustic features is beneficial for stance detection. However, we observe variation in whether using hierarchical models for encoding lexical and acoustic information improves performance, suggesting that some aspects of speaker stance are expressed more locally than others. Overall, our findings highlight the importance of modelling interaction dynamics and non-lexical content for stance detection.
引用
收藏
页码:46 / 50
页数:5
相关论文
共 34 条
[1]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[2]  
Du Bois J. W., 2007, STANCETAKING DISCOUR, P139, DOI [DOI 10.1075/PBNS.164.07DU, 10.1075/pbns.164.07du]
[3]   The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing [J].
Eyben, Florian ;
Scherer, Klaus R. ;
Schuller, Bjoern W. ;
Sundberg, Johan ;
Andre, Elisabeth ;
Busso, Carlos ;
Devillers, Laurence Y. ;
Epps, Julien ;
Laukka, Petri ;
Narayanan, Shrikanth S. ;
Truong, Khiet P. .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2016, 7 (02) :190-202
[4]  
Eyben Florian, 2010, P 18 ACM INT C MULT, P1459
[5]   The world of emotions is not two-dimensional [J].
Fontaine, Johnny R. J. ;
Scherer, Klaus R. ;
Roesch, Etienne B. ;
Ellsworth, Phoebe C. .
PSYCHOLOGICAL SCIENCE, 2007, 18 (12) :1050-1057
[6]   Adapting to Student Uncertainty Improves Tutoring Dialogues [J].
Forbes-Riley, Kate ;
Litman, Diane .
ARTIFICIAL INTELLIGENCE IN EDUCATION: BUILDING LEARNING SYSTEMS THAT CARE: FROM KNOWLEDGE REPRESENTATION TO AFFECTIVE MODELLING, 2009, 200 :33-+
[7]  
Freeman V., 2015, THESIS U WASHINGTON
[8]  
Freeman V, 2014, INTERSPEECH, P303
[9]  
Grosz B. J., 1986, Computational Linguistics, V12, P175
[10]  
Hazarika Devamanyu, 2018, Proc Conf, V2018, P2122, DOI 10.18653/v1/n18-1193