Automatic Classification of Screen Gaze and Dialogue in Doctor-Patient-Computer Interactions: Computational Ethnography Algorithm Development and Validation

被引：0

作者：

Helou, Samar ^{[1
]}

Abou-Khalil, Victoria ^{[2
]}

Iacobucci, Riccardo ^{[3
]}

El Helou, Elie ^{[4
]}

Kiyono, Ken ^{[5
]}

机构：

[1] Osaka Univ, Global Ctr Med Engn & Informat, Osaka, Japan

[2] Kyoto Univ, Acad Ctr Comp & Media Studies, Kyoto, Japan

[3] Kyoto Univ, Grad Sch Engn, Dept Urban Management, Kyoto, Japan

[4] St Joseph Univ, Fac Med, Beirut, Lebanon

[5] Osaka Univ, Grad Sch Engn Sci, Osaka, Japan

来源：

JOURNAL OF MEDICAL INTERNET RESEARCH | 2021年 / 23卷 / 05期

基金：

日本学术振兴会;

关键词：

computational ethnography; patient-physician communication; doctor-patient-computer interaction; electronic medical records; pose estimation; gaze; voice activity; dialogue; clinic layout; ELECTRONIC HEALTH RECORD; INTERACTION ANALYSIS SYSTEM; COMMUNICATION-SKILLS; NONVERBAL-COMMUNICATION; PHYSICIANS; CLINICIAN; BEHAVIOR; CONSULTATIONS; PATTERNS; TALK;

D O I：

10.2196/25218

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background: The study of doctor-patient-computer interactions is a key research area for examining doctor-patient relationships; however, studying these interactions is costly and obtrusive as researchers usually set up complex mechanisms or intrude on consultations to collect, then manually analyze the data. Objective: We aimed to facilitate human-computer and human-human interaction research in clinics by providing a computational ethnography tool: an unobtrusive automatic classifier of screen gaze and dialogue combinations in doctor-patient-computer interactions. Methods: The classifier's input is video taken by doctors using their computers' internal camera and microphone. By estimating the key points of the doctor's face and the presence of voice activity, we estimate the type of interaction that is taking place. The classification output of each video segment is 1 of 4 interaction classes: (1) screen gaze and dialogue, wherein the doctor is gazing at the computer screen while conversing with the patient; (2) dialogue, wherein the doctor is gazing away from the computer screen while conversing with the patient; (3) screen gaze, wherein the doctor is gazing at the computer screen without conversing with the patient; and (4) other, wherein no screen gaze or dialogue are detected. We evaluated the classifier using 30 minutes of video provided by 5 doctors simulating consultations in their clinics both in semi- and fully inclusive layouts. Results: The classifier achieved an overall accuracy of 0.83, a performance similar to that of a human coder. Similar to the human coder, the classifier was more accurate in fully inclusive layouts than in semi-inclusive layouts. Conclusions: The proposed classifier can be used by researchers, care providers, designers, medical educators, and others who are interested in exploring and answering questions related to screen gaze and dialogue in doctor-patient-computer interactions.

引用

页数：14