Classification of Multi Speaker Shouted Speech and Single Speaker Normal Speech

被引:0
|
作者
Baghel, Shikha [1 ]
Prasanna, S. R. Mahadeva [1 ]
Guha, Prithwijit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, Assam, India
来源
TENCON 2017 - 2017 IEEE REGION 10 CONFERENCE | 2017年
关键词
Shouted / normal speech classification; Source features; spectral features; SVM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This work proposes a method for the shouted and multi speaker's vs normal and single speaker's speech classification, which is the most frequently occurring scenario in news debates. In this work, multi speaker shouted and single speaker normal speech classes are addressed as shouted and normal speech, respectively. Spectral features and source features are explored for the classification task. The source characteristics are studied in terms of strength of excitation (SoE). Spectral flux, spectral tilt, sum of ten largest spectral peaks (STLP), modulation spectrum energy (ModSE) and Mel frequency cepstral coefficients (MFCCs) are explored as the spectral features. Shouted and normal speech are classified using two approaches. In the first approach, these features, except MFCCs, are non-linearly mapped and combined using a threshold based technique. In the second approach, a predefined radial basis function (RBF) kernel based Support Vector Machine (SVM) classifier is used for the classification task on the extracted features. The performance evaluation is done in terms of F-Score. The performance is also evaluated on the basis of leave one out analysis to measure the strength of a particular feature for this task. By leave one out analysis, SoE is the most important feature among all one-dimensional features. When all the features are combined for classification, F-score of forty four dimensional feature is highest.
引用
收藏
页码:2388 / 2392
页数:5
相关论文
共 50 条
  • [21] Exploration of excitation source information for shouted and normal speech classification
    Baghel, Shikha
    Prasanna, S. R. Mahadeva
    Guha, Prithwijit
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 147 (02): : 1250 - 1261
  • [22] Speaker Adaptive Classification Procedure for Speech Recognition.
    Katterfeldt, Harald
    Thon, Werner
    1974, 27 (06): : 230 - 232
  • [23] AUTOMATED SPEECH RECOGNITION SYSTEM FOR SPEAKER EMOTION CLASSIFICATION
    Anithadevi, N.
    Gokul, P.
    Nandan, S. Muhil
    Magesh, R.
    Shiddharth, S.
    PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [24] Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
    Luong, Hieu-Thi
    Wang, Xin
    Yamagishi, Junichi
    Nishizawa, Nobuyuki
    INTERSPEECH 2019, 2019, : 1303 - 1307
  • [25] Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis
    Fujita, Kenichi
    Ando, Atsushi
    Ijima, Yusuke
    INTERSPEECH 2021, 2021, : 3141 - 3145
  • [26] Speaker Separation Using Speaker Inventories and Estimated Speech
    Wang, Peidong
    Chen, Zhuo
    Wang, DeLiang
    Li, Jinyu
    Gong, Yifan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 537 - 546
  • [27] MultiSpeech: Multi-Speaker Text to Speech with Transformer
    Chen, Mingjian
    Tan, Xu
    Ren, Yi
    Xu, Jin
    Sun, Hao
    Zhao, Sheng
    Qin, Tao
    INTERSPEECH 2020, 2020, : 4024 - 4028
  • [28] Speaker Verification Based on Single Channel Speech Separation
    Jin, Rong
    Ablimit, Mijit
    Hamdulla, Askar
    IEEE ACCESS, 2023, 11 : 112631 - 112638
  • [29] SPEAKER CHARACTERIZATION IN SPEECH TECHNOLOGY
    LAUER, J
    SPEECH COMMUNICATION, 1991, 10 (5-6) : 431 - 433
  • [30] ON AUTOMATIC VOICE CASTING FOR EXPRESSIVE SPEECH: SPEAKER RECOGNITION VS. SPEECH CLASSIFICATION
    Obin, Nicolas
    Roebel, Axel
    Bachman, Gregoire
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,