Voice Emotion Recognition Based on Color Histogram Features

被引：0

作者：

da Rocha, Marcelo Marques ^{[1
]}

Conci, Aura ^{[2
]}

Muchaluat Saade, Debora Christina ^{[1
]}

机构：

[1] Flutninense Fed Univ, MidiaCom Lab, Inst Comp, Niteroi, RJ, Brazil

[2] Flurninense Fed Univ, Visual Lab, Inst Comp, Niteroi, RJ, Brazil

来源：

2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年

关键词：

voice recognition; color histogram; sentiment analysis; emotion recognition;

D O I：

10.1109/CBMS58004.2023.00241

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Voice is the fastest and most efficient method of communication among humans. Researchers believe that it can also be considered the most efficient means of communication between humans and machines. Voice can, in addition to providing useful information, inform us about the emotional state of the person who is speaking. For many applications, being able to identify the emotion is crucial, as it allows the application to adapt to the user. In the case of human-robot interaction, recognizing the user's emotion allows the robot to be more empathetic during interactions. In addition to other methods, such as recognition of facial expressions and recognition through body expressions, recognition of emotions through speech can be used as an additional component in identification the user's emotional state. This work proposes the recognition of emotion through speech using an approach based on image processing of the voice audio signal spectrogram. Two new features based on color histograms are proposed. One thousand six hundred audio files with phrases considering four types of emotions (angry, happy, neutral and sad) were processed and classified. These phrases were spoken by women half by a 64 years old (Subject64) and the rest by a 26 years old one (Subject-26). These files are a subset of the TESS (Toronto Emotional Speech Set) dataset. When processing subject-26's voice, an precision of 94.40% and 91.90% was achieved in detecting neutral and sad emotions, respectively. When processing subject-64's voice, an precision of 97.00% was achieved for the angry emotion. The results obtained show the proposal great potential.

引用

页码：341 / 347

页数：7

共 18 条

[1]

[Anonymous], 2009, SIGKDD Explorations, DOI [DOI 10.1145/1656274.1656278, 10.1145/1656274.1656278]

[2]

[Anonymous], 2008, TELEHEALTH ASSISTIVE

[3]

Azevedo E., 2022, COMPUTACAO GRAFICA

[4] Image mining by content [J].

Conci, A ;

Castro, EMMM .

EXPERT SYSTEMS WITH APPLICATIONS, 2002, 23 (04) :377-383

[5] Speech Emotion Recognition Using ANN on MFCC Features [J].

Dolka, Harshit ;

Xavier, Arul V. M. ;

Juliet, Sujitha .

ICSPC'21: 2021 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICPSC), 2021, :431-435

[6]

El Seknedy Mai, 2021, 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), P361, DOI 10.1109/ICICIS52592.2021.9694246

[7]

Garg Utkarsh, 2020, 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), P87, DOI 10.1109/CICN49253.2020.9242635

[8] Fuzzy color histogram and its use in color image retrieval [J].

Han, J ;

Ma, KK .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2002, 11 (08) :944-952

[9]

Heng Li, 2021, 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), P795, DOI 10.1109/ICSIP52628.2021.9689043

[10] Speech Emotion Recognition using Convolutional and Recurrent Neural Networks [J].

Lim, Wootaek ;

Jang, Daeyoung ;

Lee, Taejin .

2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,

← 1 2 →