A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach

被引:52
|
作者
Seng, Kah Phooi [1 ]
Ang, Li-Minn [1 ]
Ooi, Chien Shing [2 ]
机构
[1] Charles Sturt Univ, Sch Comp & Math, Bathurst, NSW 2678, Australia
[2] Sunway Univ, Dept Comp Sci & Networked Syst, Subang Jaya 47500, Malaysia
关键词
Emotion recognition; audio-visual processing; rule-based; machine learning; multimodal system; LINEAR DISCRIMINANT-ANALYSIS; EFFICIENT APPROACH; FACE; FRAMEWORK; FUSION; AUDIO; LDA;
D O I
10.1109/TAFFC.2016.2588488
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes an audio-visual emotion recognition system that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio and video paths. The visual path is designed using the Bi-directional Principal Component Analysis (BDPCA) and Least-Square Linear Discriminant Analysis (LSLDA) for dimensionality reduction and discrimination. The extracted visual features are passed into a newly designed Optimized Kernel-Laplacian Radial Basis Function (OKL-RBF) neural classifier. The audio path is designed using a combination of input prosodic features (pitch, log-energy, zero crossing rates and Teager energy operator) and spectral features (Mel-scale frequency cepstral coefficients). The extracted audio features are passed into an audio feature level fusion module that uses a set of rules to determine the most likely emotion contained in the audio signal. An audio visual fusion module fuses outputs from both paths. The performances of the proposed audio path, visual path, and the final system are evaluated on standard databases. Experiment results and comparisons reveal the good performance of the proposed system.
引用
收藏
页码:3 / 13
页数:11
相关论文
共 50 条
  • [21] Audio-Visual Emotion Recognition with Capsule-like Feature Representation and Model-Based Reinforcement Learning
    Ouyang, Xi
    Nagisetty, Srikanth
    Goh, Ester Gue Hua
    Shen, Shengmei
    Ding, Wan
    Ming, Huaiping
    Huang, Dong-Yan
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [22] Audio-Visual Emotion Recognition Using a Hybrid Deep Convolutional Neural Network based on Census Transform
    Cornejo, Jadisha Yarif Ramirez
    Pedrini, Helio
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3396 - 3402
  • [23] Jointly Learning From Unimodal and Multimodal-Rated Labels in Audio-Visual Emotion Recognition
    Goncalves, Lucas
    Chou, Huang-Cheng
    Salman, Ali N.
    Lee, Chi-Chun
    Busso, Carlos
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 165 - 174
  • [24] Leveraging Inter-rater Agreement for Audio-Visual Emotion Recognition
    Kim, Yelin
    Provost, Emily Mower
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 553 - 559
  • [25] A Neural Network Architecture for Children's Audio-Visual Emotion Recognition
    Matveev, Anton
    Matveev, Yuri
    Frolova, Olga
    Nikolaev, Aleksandr
    Lyakso, Elena
    MATHEMATICS, 2023, 11 (22)
  • [26] Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 281 - 284
  • [27] Audio-visual affect recognition
    Zeng, Zhihong
    Tu, Jilin
    Liu, Ming
    Huang, Thomas S.
    Pianfetti, Brian
    Roth, Dan
    Levinson, Stephen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 424 - 428
  • [28] Optimizing Speech Emotion Recognition with Machine Learning Based Advanced Audio Cue Analysis
    Pallewela, Nuwan
    Alahakoon, Damminda
    Adikari, Achini
    Pierce, John E.
    Rose, Miranda L.
    TECHNOLOGIES, 2024, 12 (07)
  • [29] A Visual-Audio-Based Emotion Recognition System Integrating Dimensional Analysis
    Tian, Jiajia
    She, Yingying
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (06) : 3273 - 3282
  • [30] Scope for Deep Learning:A Study in Audio-Visual Speech Recognition
    Bhaskar, Shabina
    Thasleema, T. M.
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND KNOWLEDGE ECONOMY (ICCIKE' 2019), 2019, : 72 - 77