CorrNet: Fine-Grained Emotion Recognition for Video Watching Using Wearable Physiological Sensors

被引:40
作者
Zhang, Tianyi [1 ,2 ]
El Ali, Abdallah [2 ]
Wang, Chen [3 ,4 ]
Hanjalic, Alan [1 ]
Cesar, Pablo [1 ,2 ]
机构
[1] Delft Univ Technol, Multimedia Comp Grp, NL-2600 AA Delft, Netherlands
[2] Ctr Wiskunde & Informat CWI, NL-1098XG Amsterdam, Netherlands
[3] Xinhuanet, Future Media & Convergence Inst, Beijing 100000, Peoples R China
[4] Xinhua News Agcy, State Key Lab Media Convergence Prod Technol & Sy, Beijing 100000, Peoples R China
关键词
emotion recognition; video; physiological signals; machine learning; SYSTEM; TECHNOLOGY; FRAMEWORK; SIGNALS; CONTEXT; SET;
D O I
10.3390/s21010052
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Recognizing user emotions while they watch short-form videos anytime and anywhere is essential for facilitating video content customization and personalization. However, most works either classify a single emotion per video stimuli, or are restricted to static, desktop environments. To address this, we propose a correlation-based emotion recognition algorithm (CorrNet) to recognize the valence and arousal (V-A) of each instance (fine-grained segment of signals) using only wearable, physiological signals (e.g., electrodermal activity, heart rate). CorrNet takes advantage of features both inside each instance (intra-modality features) and between different instances for the same video stimuli (correlation-based features). We first test our approach on an indoor-desktop affect dataset (CASE), and thereafter on an outdoor-mobile affect dataset (MERCA) which we collected using a smart wristband and wearable eyetracker. Results show that for subject-independent binary classification (high-low), CorrNet yields promising recognition accuracies: 76.37% and 74.03% for V-A on CASE, and 70.29% and 68.15% for V-A on MERCA. Our findings show: (1) instance segment lengths between 1-4 s result in highest recognition accuracies (2) accuracies between laboratory-grade and wearable sensors are comparable, even under low sampling rates (<= 64 Hz) (3) large amounts of neutral V-A labels, an artifact of continuous affect annotation, result in varied recognition performance.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 123 条
[1]   DECAF: MEG-Based Multimodal Database for Decoding Affective Physiological Responses [J].
Abadi, Mojtaba Khomami ;
Subramanian, Ramanathan ;
Kia, Seyed Mostafa ;
Avesani, Paolo ;
Patras, Ioannis ;
Sebe, Nicu .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2015, 6 (03) :209-222
[2]   A Globally Generalized Emotion Recognition System Involving Different Physiological Signals [J].
Ali, Mouhannad ;
Al Machot, Fadi ;
Mosa, Ahmad Haj ;
Jdeed, Midhat ;
Al Machot, Elyan ;
Kyamakya, Kyandoghere .
SENSORS, 2018, 18 (06)
[3]   CNN Based Subject-Independent Driver Emotion Recognition System Involving Physiological Signals for ADAS [J].
Ali, Mouhannad ;
Al Machot, Fadi ;
Mosa, Ahmad Haj ;
Kyamakya, Kyandoghere .
ADVANCED MICROSYSTEMS FOR AUTOMOTIVE APPLICATIONS 2016: SMART SYSTEMS FOR THE AUTOMOBILE OF THE FUTURE, 2016, :125-138
[4]  
Andrew G., 2013, PMLR, V28, P1247
[5]  
[Anonymous], 2000, ISCA TUT RES WORKSH
[6]  
[Anonymous], ARXIV12125701
[7]  
[Anonymous], 1991, P 3 C MESS UND
[8]  
[Anonymous], 2017, ARXIV170808487
[9]  
AP S.C., 2014, ADV NEURAL INFORM PR, P1853
[10]   Emotion Based Music Recommendation System Using Wearable Physiological Sensors [J].
Ayata, Deger ;
Yaslan, Yusuf ;
Kamasak, Mustafa E. .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2018, 64 (02) :196-203