Directed Acyclic Graphs for Content Based Sound, Musical Genre, and Speech Emotion Classification

被引:10
|
作者
Ntalampiras, Stavros [1 ]
机构
[1] Politecn Milan, I-20133 Milan, Italy
关键词
audio signal processing; content-based generalized sound recognition; decision directed acyclic graph; hidden Markov model; RECOGNITION; MODEL;
D O I
10.1080/09298215.2013.859709
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This work introduces the methodology of Decision Directed Acyclic Graphs (DDAG) 1 to the scientific domain of content based audio signal processing. We apply the particular methodology to three multiclass classification problems involving the categories of generalized sound events, musical genres, and speech expressing emotional states. A decision graph is constructed which breaks the overall problem into a series of two-class ones. The order of the graph nodes is revealed using a clustering criterion based on the Kullback-Leibler divergence. Every graph node is composed by two hidden Markov models, each one representing the class which participates in the specific problem. We extract three heterogeneous feature sets (Mel-Filterbank, MPEG-7 Audio Spectrum Projection and Perceptual Wavelet Packets) out of each recording and fuse them for training the HMMs. Extensive comparative experiments are conducted using the following three datasets: (a) a combination of professional sound effects collections, (b) GTZAN musical genre database, and (c) BERLIN emotional speech corpus. The results demonstrate the superiority of the DDAG classification approach over the standard HMM approach regardless the application task.
引用
收藏
页码:173 / 182
页数:10
相关论文
共 21 条
  • [1] Speech based emotion classification
    Nwe, TL
    Wei, FS
    De Silva, LC
    IEEE REGION 10 INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONIC TECHNOLOGY, VOLS 1 AND 2, 2001, : 297 - 301
  • [2] Automatic music genre classification based on musical instrument track separation
    Rosner, Aldona
    Kostek, Bozena
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 50 (02) : 363 - 384
  • [3] Speech Based Emotion Classification Framework for Driver Assistance System
    Tawari, Ashish
    Trivedi, Mohan
    2010 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2010, : 174 - 178
  • [4] Speech Emotion Classification Using Attention-Based LSTM
    Xie, Yue
    Liang, Ruiyu
    Liang, Zhenlin
    Huang, Chengwei
    Zou, Cairong
    Schuller, Bjoern
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1675 - 1685
  • [5] Evaluation of Speech Emotion Classification Based on GMM and Data Fusion
    Vondra, Martin
    Vich, Robert
    CROSS-MODAL ANALYSIS OF SPEECH, GESTURES, GAZE AND FACIAL EXPRESSIONS, 2009, 5641 : 98 - 105
  • [6] Speech emotion classification using fractal dimension-based features
    Tamulevicius, Gintautas
    Karbauskaite, Rasa
    Dzemyda, Gintautas
    NONLINEAR ANALYSIS-MODELLING AND CONTROL, 2019, 24 (05): : 679 - 695
  • [7] Introducing New Feature Set based on Wavelets for Speech Emotion Classification
    Tanmoy, Roy
    Tshilidzi, Marwala
    Snehashish, Chakraverty
    Paul, Satyakama
    PROCEEDINGS OF 2018 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON), 2018, : 124 - 128
  • [8] Speech Based Multiple Emotion Classification Model Using Deep Learning
    Patneedi, Shakti Swaroop
    Kumari, Nandini
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 648 - 659
  • [9] A neurally inspired musical instrument classification system based upon the sound onset
    Newton, Michael J.
    Smith, Leslie S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (06) : 4785 - 4798
  • [10] A Comparative Study on MFCC and Fundamental Frequency Based Speech Emotion Classification
    Shah, Asfahan
    Bhowmik, Tanmay
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2022, 2022, 13145 : 173 - 184