Musical Emotion Recognition with Spectral Feature Extraction Based on a Sinusoidal Model with Model-Based and Deep-Learning Approaches

被引:5
作者
Xie, Baijun [1 ]
Kim, Jonathan C. [1 ]
Park, Chung Hyuk [1 ]
机构
[1] George Washington Univ, Dept Biomed Engn, Washington, DC 20052 USA
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 03期
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
musical emotion recognition; spectral feature extraction; sinusoidal model; principal component regression; deep learning; machine learning; PITCH;
D O I
10.3390/app10030902
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper presents a method for extracting novel spectral features based on a sinusoidal model. The method is focused on characterizing the spectral shapes of audio signals using spectral peaks in frequency sub-bands. The extracted features are evaluated for predicting the levels of emotional dimensions, namely arousal and valence. Principal component regression, partial least squares regression, and deep convolutional neural network (CNN) models are used as prediction models for the levels of the emotional dimensions. The experimental results indicate that the proposed features include additional spectral information that common baseline features may not include. Since the quality of audio signals, especially timbre, plays a major role in affecting the perception of emotional valence in music, the inclusion of the presented features will contribute to decreasing the prediction error rate.
引用
收藏
页数:11
相关论文
共 28 条
  • [21] THE SPECTRAL ENVELOPE ESTIMATION VOCODER
    PAUL, DB
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1981, 29 (04): : 786 - 794
  • [22] SHAPE INVARIANT TIME-SCALE AND PITCH MODIFICATION OF SPEECH
    QUATIERI, TF
    MCAULAY, RJ
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1992, 40 (03) : 497 - 510
  • [23] Facial and vocal expressions of emotion
    Russell, JA
    Bachorowski, JA
    Fernández-Dols, JM
    [J]. ANNUAL REVIEW OF PSYCHOLOGY, 2003, 54 : 329 - 349
  • [24] Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms
    Satt, Aharon
    Rozenberg, Shai
    Hoory, Ron
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1089 - 1093
  • [25] SPECTRAL MODELING SYNTHESIS - A SOUND ANALYSIS SYNTHESIS SYSTEM BASED ON A DETERMINISTIC PLUS STOCHASTIC DECOMPOSITION
    SERRA, X
    SMITH, J
    [J]. COMPUTER MUSIC JOURNAL, 1990, 14 (04) : 12 - 24
  • [26] Simonyan K., 2014, 14091556 ARXIV
  • [27] Soleymani M., 2013, P 2 ACM INT WORKSH C, P1, DOI DOI 10.1145/2506364.2506365
  • [28] Rethinking the Inception Architecture for Computer Vision
    Szegedy, Christian
    Vanhoucke, Vincent
    Ioffe, Sergey
    Shlens, Jon
    Wojna, Zbigniew
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2818 - 2826