Music Theory-Inspired Acoustic Representation for Speech Emotion Recognition

被引:7
作者
Li, Xingfeng [1 ]
Shi, Xiaohan [2 ]
Hu, Desheng [3 ]
Li, Yongwei [4 ]
Zhang, Qingchen [1 ]
Wang, Zhengxia [5 ]
Unoki, Masashi [6 ]
Akagi, Masato [6 ]
机构
[1] Hainan Univ, Grad Sch Comp Sci & Technol, Haikou 570288, Peoples R China
[2] Nagoya Univ, Sch Informat Sci, Nagoya 4648601, Japan
[3] Taiyuan Univ Technol, Coll Informat & Comp, Taiyuan 030024, Peoples R China
[4] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
[5] Hainan Univ, Sch Comp Sci & Technol, Haikou 570288, Peoples R China
[6] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi 9231292, Japan
基金
中国国家自然科学基金;
关键词
Affective computing; speech emotion recognition; acoustic representation; music theory and speech analysis; PERCEPTION; EXPRESSION; PATTERNS; FEATURES; PITCH; PERSPECTIVE; MODALITIES; KNOWLEDGE; INTERVALS; COGNITION;
D O I
10.1109/TASLP.2023.3289312
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This research presents a music theory-inspired acoustic representation (hereafter, MTAR) to address improved speech emotion recognition. The recognition of emotion in speech and music is developed in parallel, yet a relatively limited understanding of MTAR for interpreting speech emotions is involved. In the present study, we use music theory to study representative acoustics associated with emotion in speech from vocal emotion expressions and auditory emotion perception domains. In experiments assessing the role and effectiveness of the proposed representation in classifying discrete emotion categories and predicting continuous emotion dimensions, it shows promising performance compared with extensively used features for emotion recognition based on the spectrogram, Mel-spectrogram, Mel-frequency cepstral coefficients, VGGish, and the large baseline feature sets of the INTERSPEECH challenges. This proposal opens up a novel research avenue in developing a computational acoustic representation of speech emotion via music theory.
引用
收藏
页码:2534 / 2547
页数:14
相关论文
共 50 条
[41]   Differential Impacts of Monologue and Conversation on Speech Emotion Recognition [J].
Chien, Woan-Shiuan ;
Upadhyay, Shreya G. ;
Lin, Wei-Cheng ;
Busso, Carlos ;
Lee, Chi-Chun .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2025, 16 (02) :485-498
[42]   A survey on the development of intelligent robots in speech emotion recognition [J].
Gao, Qingnan ;
Ning, Huansheng ;
Du, Bing .
IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, :951-956
[43]   Evaluation of the Effect of Frame Size on Speech Emotion Recognition [J].
Ozseven, Turgut .
2018 2ND INTERNATIONAL SYMPOSIUM ON MULTIDISCIPLINARY STUDIES AND INNOVATIVE TECHNOLOGIES (ISMSIT), 2018, :18-21
[44]   An Attention Pooling based Representation Learning Method for Speech Emotion Recognition [J].
Li, Pengcheng ;
Song, Yan ;
McLoughlin, Ian ;
Guo, Wu ;
Dai, Lirong .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3087-3091
[45]   Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation [J].
Liu, Jiaxing ;
Liu, Zhilei ;
Wang, Longbiao ;
Gao, Yuan ;
Guo, Lili ;
Dang, Jianwu .
INTERSPEECH 2020, 2020, :2337-2341
[46]   Adaptive Domain-Aware Representation Learning for Speech Emotion Recognition [J].
Fan, Weiquan ;
Xu, Xiangmin ;
Xing, Xiaofen ;
Huang, Dongyan .
INTERSPEECH 2020, 2020, :4089-4093
[47]   Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition [J].
Lian, Zheng ;
Tao, Jianhua ;
Liu, Bin ;
Huang, Jian .
INTERSPEECH 2019, 2019, :3840-3844
[48]   Pseudo-colored rate map representation for speech emotion recognition [J].
Ozer, Ilyas .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 66
[49]   Research on Emergency Parking Instruction Recognition Based on Speech Recognition and Speech Emotion Recognition [J].
Tian Kexin ;
Huang Yongming ;
Zhang Guobao ;
Zhang Lin .
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, :2933-2937
[50]   Transfer Learning for Speech Emotion Recognition [J].
Han Zhijie ;
Zhao, Huijuan ;
Wang, Ruchuan .
2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, :96-99