An Emotion Recognition Method Based on Eye Movement and Audiovisual Features in MOOC Learning Environment

被引：12

作者：

Bao, Jindi ^{[1
]}

Tao, Xiaomei ^{[2
,3
]}

Zhou, Yinghui ^{[1
]}

机构：

[1] Guilin Univ Technol, Sch Informat Sci & Engn, Guilin 541004, Peoples R China

[2] Guangxi Normal Univ, Sch Comp Sci & Engn, Guilin 541000, Peoples R China

[3] Guangxi Normal Univ, Sch Software, Guilin 541000, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 01期

基金：

美国国家科学基金会;

关键词：

Emotion recognition; feature extraction; massive online open course (MOOC); multimodal analysis;

D O I：

10.1109/TCSS.2022.3221128

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, more and more people have begun to use massive online open course (MOOC) platforms for distance learning. However, due to the space-time isolation between teachers and students, the negative emotional state of students in MOOC learning cannot be identified timely. Therefore, students cannot receive immediate feedback about their emotional states. In order to identify and classify learners' emotions in video learning scenarios, we propose a multimodal emotion recognition method based on eye movement signals, audio signals, and video images. In this method, two novel features are proposed: feature of coordinate difference of eyemovement (FCDE) and pixel change rate sequence (PCRS). FCDE is extracted by combining eye movement coordinate trajectory and video optical flow trajectory, which can represent the learner's attention degree. PCRS is extracted from the video image, which can represent the speed of image switching. A feature extraction network based on convolutional neural network (CNN) (FE-CNN) is designed to extract the deep features of the three modals. The extracted deep features are inputted into the emotion classification CNN (EC-CNN) to classify the emotions, including interest, happiness, confusion, and boredom. In single modal identification, the recognition accuracies corresponding to the three modals are 64.32%, 74.67%, and 71.88%. The three modals are fused by feature-level fusion, decision-level fusion, and model-level fusion methods, and the evaluation experiment results show that the method of decision-level fusion achieved the highest score of 81.90% of emotion recognition. Finally, the effectiveness of FCDE, FE-CNN, and EC-CNN modules is verified by ablation experiments.

引用

页码：171 / 183

页数：13

共 34 条

[1] Lucas-Kanade 20 years on: A unifying framework [J].

Baker, S ;

Matthews, I .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2004, 56 (03) :221-255

[2]

Bandela SR, 2017, INT CONF COMPUT

[3] Feature-level fusion approaches based on multimodal EEG data for depression recognition [J].

Cai, Hanshu ;

Qu, Zhidiao ;

Li, Zhe ;

Zhang, Yi ;

Hu, Xiping ;

Hu, Bin .

INFORMATION FUSION, 2020, 59 (59) :127-138

[4] Personal-Zscore: Eliminating Individual Difference for EEG-Based Cross-Subject Emotion Recognition [J].

Chen, Huayu ;

Sun, Shuting ;

Li, Jianxiu ;

Yu, Ruilan ;

Li, Nan ;

Li, Xiaowei ;

Hu, Bin .

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) :2077-2088

[5]

[褚钰 Chu Yu], 2020, [应用声学, Journal of Applied Acoustics], V39, P216

[6] A Cross-Culture Study on Multimodal Emotion Recognition Using Deep Learning [J].

Gan, Lu ;

Liu, Wei ;

Luo, Yun ;

Wu, Xun ;

Lu, Bao-Liang .

NEURAL INFORMATION PROCESSING (ICONIP 2019), PT IV, 2019, 1142 :670-680

[7]

Gao Y., 2020, COMPUTER ENG DESIGN, P3550

[8]

Guo JJ, 2019, IEEE ENG MED BIO, P3071, DOI [10.1109/EMBC.2019.8856563, 10.1109/embc.2019.8856563]

[9]

Haiqing Zheng, 2019, 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS). Proceedings, P493

[10] Fundamentals of Computational Psychophysiology: Theory and Methodology [J].

Hu, Bin ;

Shen, Jian ;

Zhu, Lixian ;

Dong, Qunxi ;

Cai, Hanshu ;

Qian, Kun .

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (02) :349-355

← 1 2 3 4 →