Classification of Multi Speaker Shouted Speech and Single Speaker Normal Speech

被引:0
作者
Baghel, Shikha [1 ]
Prasanna, S. R. Mahadeva [1 ]
Guha, Prithwijit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, Assam, India
来源
TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE | 2017年
关键词
Shouted / normal speech classification; Source features; spectral features; SVM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This work proposes a method for the shouted and multi speaker's vs normal and single speaker's speech classification, which is the most frequently occurring scenario in news debates. In this work, multi speaker shouted and single speaker normal speech classes are addressed as shouted and normal speech, respectively. Spectral features and source features are explored for the classification task. The source characteristics are studied in terms of strength of excitation (SoE). Spectral flux, spectral tilt, sum of ten largest spectral peaks (STLP), modulation spectrum energy (ModSE) and Mel frequency cepstral coefficients (MFCCs) are explored as the spectral features. Shouted and normal speech are classified using two approaches. In the first approach, these features, except MFCCs, are non-linearly mapped and combined using a threshold based technique. In the second approach, a predefined radial basis function (RBF) kernel based Support Vector Machine (SVM) classifier is used for the classification task on the extracted features. The performance evaluation is done in terms of F-Score. The performance is also evaluated on the basis of leave one out analysis to measure the strength of a particular feature for this task. By leave one out analysis, SoE is the most important feature among all one-dimensional features. When all the features are combined for classification, F-score of forty four dimensional feature is highest.
引用
收藏
页码:2388 / 2392
页数:5
相关论文
共 50 条
  • [41] Accent classification from an emotional speech in clean and noisy environments
    Priya Dharshini G
    K Sreenivasa Rao
    Multimedia Tools and Applications, 2023, 82 : 3485 - 3508
  • [42] Wavelet-based imagined speech classification using electroencephalography
    Pawar, Dipti
    Dhage, Sudhir
    INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2022, 38 (03) : 215 - 224
  • [43] Classification of EEG Based Imagine Speech Using Time Domain Features
    Paul, Yogesh
    Jaswal, Ram Avtar
    Kajal, Sanjay
    2018 INTERNATIONAL CONFERENCE ON RECENT INNOVATIONS IN ELECTRICAL, ELECTRONICS & COMMUNICATION ENGINEERING (ICRIEECE 2018), 2018, : 2921 - 2924
  • [44] Efficient feature extraction and classification for the development of Pashto speech recognition system
    Irfan Ahmed
    Muhammad Abeer Irfan
    Abid Iqbal
    Amaad Khalil
    Salman Ilahi Siddiqui
    Multimedia Tools and Applications, 2024, 83 : 54081 - 54096
  • [45] Efficient feature extraction and classification for the development of Pashto speech recognition system
    Ahmed, Irfan
    Irfan, Muhammad Abeer
    Iqbal, Abid
    Khalil, Amaad
    Siddiqui, Salman Ilahi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 54081 - 54096
  • [46] Comparison between five classification techniques for classifying emotions in human speech
    Pathak, Bageshree, V
    Patil, Deepti R.
    More, Shweta D.
    Mhetre, Nikita R.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 201 - 207
  • [47] A Perspective Study on Speech Emotion Recognition: Databases, Features and Classification Models
    Raghu, Kogila
    Sadanandam, Manchala
    TRAITEMENT DU SIGNAL, 2021, 38 (06) : 1861 - 1873
  • [48] Hybrid Transformer Architectures With Diverse Audio Features for Deepfake Speech Classification
    Zaman, Khalid
    Samiul, Islam J. A. M.
    Sah, Melike
    Direkoglu, Cem
    Okada, Shogo
    Unoki, Masashi
    IEEE ACCESS, 2024, 12 : 149221 - 149237
  • [49] Multi-modal Emotion Recognition Based on Speech and Image
    Li, Yongqiang
    He, Qi
    Zhao, Yongping
    Yao, Hongxun
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT I, 2018, 10735 : 844 - 853
  • [50] Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (02) : 973 - 987