Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers

被引:0
|
作者
Georgila, Kallirroi [1 ]
Gordon, Carla [1 ]
Yanov, Volodymyr [1 ]
Traum, David [1 ]
机构
[1] Univ Southern Calif, Inst Creat Technol, 12015 Waterfront Dr, Los Angeles, CA 90094 USA
关键词
dialogue evaluation functions; real and simulated dialogues; Internet of Things; USER SIMULATION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We collected a corpus of dialogues in a Wizard of Oz (WOz) setting in the Internet of Things (IoT) domain. We asked users participating in these dialogues to rate the system on a number of aspects, namely, intelligence, naturalness, personality, friendliness, their enjoyment, overall quality, and whether they would recommend the system to others. Then we asked dialogue observers, i.e., Amazon Mechanical Turkers (MTurkers), to rate these dialogues on the same aspects. We also generated simulated dialogues between dialogue policies and simulated users and asked MTurkers to rate them again on the same aspects. Using linear regression, we developed dialogue evaluation functions based on features from the simulated dialogues and the MTurkers' ratings, the WOz dialogues and the MTurkers' ratings, and the WOz dialogues and the WOz participants' ratings. We applied all these dialogue evaluation functions to a held-out portion of our WOz dialogues, and we report results on the predictive power of these different types of dialogue evaluation functions. Our results suggest that for three conversational aspects (intelligence, naturalness, overall quality) just training evaluation functions on simulated data could be sufficient.
引用
收藏
页码:726 / 734
页数:9
相关论文
共 50 条
  • [21] SITUATIONAL TESTS: III. OBSERVERS' RATINGS OF LEADERLESS GROUP DISCUSSION PARTICIPANTS AS INDICATORS OF EXTERNAL LEADERSHIP STATUS
    Bass, Bernard M.
    White, Otey L.
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1951, 11 (03) : 355 - 361
  • [22] Blinded Clinical Ratings of Social Media Data are Correlated with In -Person Clinical Ratings in Participants Diagnosed with Either Depression, Schizophrenia, or Healthy Controls
    Kelly, Deanna L.
    Spaderna, Max
    Hodzic, Vedrana
    Nair, Suraj
    Kitchen, Christopher
    Werkheiser, Anne E.
    Powell, Megan M.
    Liu, Fang
    Coppersmith, Glen
    Chen, Shuo
    Resnik, Philip
    PSYCHIATRY RESEARCH, 2020, 294
  • [23] Employee Ratings and Reviews Data from Glassdoor
    Zhou, Mi
    Li, Yaxuan
    Qiao, Zhilei
    Shi, Bowen
    JOURNAL OF INFORMATION SYSTEMS, 2023, 38 (03) : 93 - 105
  • [24] A Recommendation Engine for Predicting Movie Ratings Using a Big Data Approach
    Awan, Mazhar Javed
    Khan, Rafia Asad
    Nobanee, Haitham
    Yasin, Awais
    Anwar, Syed Muhammad
    Naseem, Usman
    Singh, Vishwa Pratap
    ELECTRONICS, 2021, 10 (10)
  • [25] I Will Survive: Predicting Business Failures from Customer Ratings
    Naumzik, Christof
    Feuerriegel, Stefan
    Weinmann, Markus
    MARKETING SCIENCE, 2022, 41 (01) : 188 - 207
  • [26] PREDICTING LEADERSHIP RATINGS FROM HIGH-SCHOOL ACTIVITIES
    KRUMBOLTZ, JD
    CHRISTAL, RE
    WARD, JH
    JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1959, 50 (03) : 105 - 110
  • [27] PREDICTING ATTITUDE FROM DESIRABILITY AND LIKELIHOOD RATINGS OF RHETORICAL PROPOSITIONS
    INFANTE, DA
    SPEECH MONOGRAPHS, 1971, 38 (04): : 321 - 326
  • [28] TEACHER RATINGS FROM INCOMPLETE STUDENT RANKING DATA
    KAISER, HF
    CERNY, BA
    EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1979, 39 (03) : 577 - 584
  • [29] Analysis and prediction of hotel ratings from crowdsourced data
    Leal, Fatima
    Malheiro, Benedita
    Burguillo, Juan Carlos
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 9 (02)
  • [30] ORGANOLEPTIC RATINGS OF WINES ESTIMATED FROM ANALYTICAL DATA
    BAKER, GA
    AMERINE, MA
    FOOD RESEARCH, 1953, 18 (04): : 381 - 389