CorrNet: Fine-Grained Emotion Recognition for Video Watching Using Wearable Physiological Sensors

被引:40
作者
Zhang, Tianyi [1 ,2 ]
El Ali, Abdallah [2 ]
Wang, Chen [3 ,4 ]
Hanjalic, Alan [1 ]
Cesar, Pablo [1 ,2 ]
机构
[1] Delft Univ Technol, Multimedia Comp Grp, NL-2600 AA Delft, Netherlands
[2] Ctr Wiskunde & Informat CWI, NL-1098XG Amsterdam, Netherlands
[3] Xinhuanet, Future Media & Convergence Inst, Beijing 100000, Peoples R China
[4] Xinhua News Agcy, State Key Lab Media Convergence Prod Technol & Sy, Beijing 100000, Peoples R China
关键词
emotion recognition; video; physiological signals; machine learning; SYSTEM; TECHNOLOGY; FRAMEWORK; SIGNALS; CONTEXT; SET;
D O I
10.3390/s21010052
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Recognizing user emotions while they watch short-form videos anytime and anywhere is essential for facilitating video content customization and personalization. However, most works either classify a single emotion per video stimuli, or are restricted to static, desktop environments. To address this, we propose a correlation-based emotion recognition algorithm (CorrNet) to recognize the valence and arousal (V-A) of each instance (fine-grained segment of signals) using only wearable, physiological signals (e.g., electrodermal activity, heart rate). CorrNet takes advantage of features both inside each instance (intra-modality features) and between different instances for the same video stimuli (correlation-based features). We first test our approach on an indoor-desktop affect dataset (CASE), and thereafter on an outdoor-mobile affect dataset (MERCA) which we collected using a smart wristband and wearable eyetracker. Results show that for subject-independent binary classification (high-low), CorrNet yields promising recognition accuracies: 76.37% and 74.03% for V-A on CASE, and 70.29% and 68.15% for V-A on MERCA. Our findings show: (1) instance segment lengths between 1-4 s result in highest recognition accuracies (2) accuracies between laboratory-grade and wearable sensors are comparable, even under low sampling rates (<= 64 Hz) (3) large amounts of neutral V-A labels, an artifact of continuous affect annotation, result in varied recognition performance.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 123 条
[11]  
Bartolini E. E., 2011, ELICITING EMOTION FI
[12]   Understanding Mass-Market Mobile TV Behaviors in the Streaming Era [J].
Bentley, Frank ;
Lottridge, Danielle .
CHI 2019: PROCEEDINGS OF THE 2019 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2019,
[13]   MEASURING EMOTION - THE SELF-ASSESSMENT MANNEQUIN AND THE SEMANTIC DIFFERENTIAL [J].
BRADLEY, MM ;
LANG, PJ .
JOURNAL OF BEHAVIOR THERAPY AND EXPERIMENTAL PSYCHIATRY, 1994, 25 (01) :49-59
[14]   Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications [J].
Calvo, Rafael A. ;
D'Mello, Sidney .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2010, 1 (01) :18-37
[15]   Wearable EEG and beyond [J].
Casson, Alexander J. .
BIOMEDICAL ENGINEERING LETTERS, 2019, 9 (01) :53-71
[16]  
Chang CY, 2010, IEEE IJCNN
[17]   Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture [J].
Chen, C. L. Philip ;
Liu, Zhulin .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (01) :10-24
[18]   Adaptive Feature Selection-Based AdaBoost-KNN With Direct Optimization for Dynamic Emotion Recognition in HumanRobot Interaction [J].
Chen, Luefeng ;
Li, Min ;
Su, Wanjuan ;
Wu, Min ;
Hirota, Kaoru ;
Pedrycz, Witold .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (02) :205-213
[19]   Correlated Differential Privacy Protection for Mobile Crowdsensing [J].
Chen, Jianwei ;
Ma, Huadong ;
Zhao, Dong ;
Liu, Liang .
IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (04) :784-795
[20]   Enhanced LSTM for Natural Language Inference [J].
Chen, Qian ;
Zhu, Xiaodan ;
Ling, Zhenhua ;
Wei, Si ;
Jiang, Hui ;
Inkpen, Diana .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1657-1668