Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features

被引:13
|
作者
Liu, Zhen-Tao [1 ,2 ,3 ]
Rehman, Abdul [1 ,2 ,3 ]
Wu, Min [1 ,2 ,3 ]
Cao, Wei-Hua [1 ,2 ,3 ]
Hao, Man [1 ,2 ,3 ]
机构
[1] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China
[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Peoples R China
[3] Minist Educ, Engn Res Ctr Intelligent Technol Geoexplorat, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Speech recognition; Reliability; Training; Emotion recognition; Human computer interaction; Task analysis; Personality analysis; speech emotion recognition; acoustic features; annotation clustering; EMOTION REGULATION; TRAITS; PREDICTION; ROBUST; MODEL; IMPRESSIONS; RELIABILITY; LIKABILITY; CONTINUITY; SUPPORT;
D O I
10.1109/TMM.2020.3025108
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech personality recognition relies on training models that require an excessive number of features and are, in most cases, designed specifically for certain databases. As a result, when tested on different datasets, overfitted classifier models are not always reliable because their accuracy changes with changes in the domain of speakers. Moreover, personality annotations are often subjective, which creates variability in raters perception during labeling. These problems inhibit the effectiveness of speech personality recognition applications. To reduce the unexplained variance caused by unknown differences in raters perception, a structure that uses Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm is proposed. Furthermore, a feature extraction method is proposed to filter out undesirable adulterations be it noise, silence, or uncertain pitch segments, while extracting essential audio features, i.e., signal power roll-off, pitch, and pause rate. Experiments on the standard SSPNet dataset records a relative 4% increase in overall accuracy when log-likelihood based annotations are used. Moreover, improved consistency in accuracy is observed when this method is tested on male and female subsets.
引用
收藏
页码:3414 / 3426
页数:13
相关论文
共 2 条
  • [1] Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning
    Gondohanindijo, Jutono
    Muljono
    Noersasongko, Edi
    Pujiono
    Setiadi, De Rosal Moses
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 198 - 206
  • [2] Odia Running Text Recognition Using Moment-Based Feature Extraction and Mean Distance Classification Technique
    Nayak, Mamata
    Nayak, Ajit Kumar
    INTELLIGENT COMPUTING, COMMUNICATION AND DEVICES, 2015, 309 : 497 - 506