Facial Expressions Recognition for Human-Robot Interaction Using Deep Convolutional Neural Networks with Rectified Adam Optimizer

被引：56

作者：

Melinte, Daniel Octavian ^{[1
]}

Vladareanu, Luige ^{[1
]}

机构：

[1] Romanian Acad Inst Solid Mech, Dept Robot & Mechatron, Bucharest 010141, Romania

来源：

SENSORS | 2020年 / 20卷 / 08期

基金：

欧盟地平线“2020”;

关键词：

computer vision; deep learning; convolutional neural networks; advanced intelligent control; facial emotion recognition; face recognition; NAO robot;

D O I：

10.3390/s20082393

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

The interaction between humans and an NAO robot using deep convolutional neural networks (CNN) is presented in this paper based on an innovative end-to-end pipeline method that applies two optimized CNNs, one for face recognition (FR) and another one for the facial expression recognition (FER) in order to obtain real-time inference speed for the entire process. Two different models for FR are considered, one known to be very accurate, but has low inference speed (faster region-based convolutional neural network), and one that is not as accurate but has high inference speed (single shot detector convolutional neural network). For emotion recognition transfer learning and fine-tuning of three CNN models (VGG, Inception V3 and ResNet) has been used. The overall results show that single shot detector convolutional neural network (SSD CNN) and faster region-based convolutional neural network (Faster R-CNN) models for face detection share almost the same accuracy: 97.8% for Faster R-CNN on PASCAL visual object classes (PASCAL VOCs) evaluation metrics and 97.42% for SSD Inception. In terms of FER, ResNet obtained the highest training accuracy (90.14%), while the visual geometry group (VGG) network had 87% accuracy and Inception V3 reached 81%. The results show improvements over 10% when using two serialized CNN, instead of using only the FER CNN, while the recent optimization model, called rectified adaptive moment optimization (RAdam), lead to a better generalization and accuracy improvement of 3%-4% on each emotion recognition CNN.

引用

页数：21

共 51 条

[21] Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
Levi, Gil
Hassner, Tal
[J]. ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 503 - 510
[22] Liu L., 2019, VARIANCE ADAPTIVE LE
[23] Liu W, 2016, SINGLE SHOT MULTIBOX, V21, P37
[24] Lopez-Rincon A, 2019, INT CONF ELECTR COMM, P146, DOI [10.1109/conielecomp.2019.8673111, 10.1109/CONIELECOMP.2019.8673111]
[25] 一种用于人脸表情识别的卷积神经网络
卢官明
何嘉利
闫静杰
李海波
[J]. 南京邮电大学学报(自然科学版), 2016, 36 (01) : 16 - 22
[26] Lucey P., 2010, P 2010 IEEE COMP VIS, P94, DOI [DOI 10.1109/CVPRW.2010.5543262, 10.1109/CVPRW.2010.5543262]
[27] Coding facial expressions with Gabor wavelets
Lyons, M
Akamatsu, S
Kamachi, M
Gyoba, J
[J]. AUTOMATIC FACE AND GESTURE RECOGNITION - THIRD IEEE INTERNATIONAL CONFERENCE PROCEEDINGS, 1998, : 200 - 205
[28] Deep Face Recognition: a Survey
Masi, Iacopo
Wu, Yue
Hassner, Tal
Natarajan, Prem
[J]. PROCEEDINGS 2018 31ST SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2018, : 471 - 478
[29] Automatic Facial Expression Recognition Using DCNN
Mayya, Veena
Pai, Radhika M.
Pai, Manohara M. M.
[J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS, 2016, 93 : 453 - 461
[30] AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild
Mollahosseini, Ali
Hasani, Behzad
Mahoor, Mohammad H.
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 18 - 31

← 1 2 3 4 5 6 →