Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers

被引:0
|
作者
Georgila, Kallirroi [1 ]
Gordon, Carla [1 ]
Yanov, Volodymyr [1 ]
Traum, David [1 ]
机构
[1] Univ Southern Calif, Inst Creat Technol, 12015 Waterfront Dr, Los Angeles, CA 90094 USA
关键词
dialogue evaluation functions; real and simulated dialogues; Internet of Things; USER SIMULATION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We collected a corpus of dialogues in a Wizard of Oz (WOz) setting in the Internet of Things (IoT) domain. We asked users participating in these dialogues to rate the system on a number of aspects, namely, intelligence, naturalness, personality, friendliness, their enjoyment, overall quality, and whether they would recommend the system to others. Then we asked dialogue observers, i.e., Amazon Mechanical Turkers (MTurkers), to rate these dialogues on the same aspects. We also generated simulated dialogues between dialogue policies and simulated users and asked MTurkers to rate them again on the same aspects. Using linear regression, we developed dialogue evaluation functions based on features from the simulated dialogues and the MTurkers' ratings, the WOz dialogues and the MTurkers' ratings, and the WOz dialogues and the WOz participants' ratings. We applied all these dialogue evaluation functions to a held-out portion of our WOz dialogues, and we report results on the predictive power of these different types of dialogue evaluation functions. Our results suggest that for three conversational aspects (intelligence, naturalness, overall quality) just training evaluation functions on simulated data could be sufficient.
引用
收藏
页码:726 / 734
页数:9
相关论文
共 50 条
  • [31] Learning Dialogue POMDP Models from Data
    Chinaei, Hamid R.
    Chaib-draa, Brahim
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 86 - 91
  • [32] PREDICTING PILOT RATINGS OF MULTI-AXIS CONTROL TASKS FROM SINGLE-AXIS DATA
    DANDER, VA
    IEEE TRANSACTIONS ON HUMAN FACTORS IN ENGINEERING, 1963, HFE4 (01): : 15 - &
  • [33] Predicting Continuous Stress Ratings Of Multiple Labellers From Physiological Signals
    Hoenig, F.
    Batliner, A.
    Eskofier, B.
    Noeth, E.
    ANALYSIS OF BIOMEDICAL SIGNALS AND IMAGES, 2008, : 363 - 368
  • [34] Predicting Holistic Ratings of Written Performance Assessments from Analytic Scoring
    Sharon Cadman Slater
    John R. Boulet
    Advances in Health Sciences Education, 2001, 6 : 103 - 119
  • [35] Predicting New TV Series Ratings from their Pilot Episode Scripts
    Hunter, Starling David
    Smith, Susan
    Chinta, Ravi
    INTERNATIONAL JOURNAL OF ENGLISH LINGUISTICS, 2016, 6 (05) : 1 - 11
  • [36] Predicting emotion in spoken dialogue from multiple knowledge sources
    Forbes-Riley, K
    Litman, DJ
    HLT-NAACL 2004: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2004, : 201 - 208
  • [37] Predicting holistic ratings of written performance assessments from analytic scoring
    Slater, SC
    Boulet, JR
    ADVANCES IN HEALTH SCIENCES EDUCATION, 2001, 6 (02) : 103 - 119
  • [38] Predicting behaviour from personality trait ratings in chimpanzees (Pan troglodytes)
    Murray, L.
    AMERICAN JOURNAL OF PRIMATOLOGY, 2005, 66 : 125 - 125
  • [39] Artificial Intelligent Human-Computer Dialogue Support Platform for Hospitals
    Xia, Xin
    Ma, Yunlong
    Luo, Ye
    Lu, Jianwei
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (06)
  • [40] Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues
    Sun, Kai
    Zhu, Su
    Chen, Lu
    Yao, Siqiu
    Wu, Xueyang
    Yu, Kai
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2060 - 2064