Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features

被引：13

作者：

Liu, Zhen-Tao ^{[1
,2
,3
]}

Rehman, Abdul ^{[1
,2
,3
]}

Wu, Min ^{[1
,2
,3
]}

Cao, Wei-Hua ^{[1
,2
,3
]}

Hao, Man ^{[1
,2
,3
]}

机构：

[1] China Univ Geosci, Sch Automat, Wuhan 430074, Peoples R China

[2] Hubei Key Lab Adv Control & Intelligent Automat C, Wuhan 430074, Peoples R China

[3] Minist Educ, Engn Res Ctr Intelligent Technol Geoexplorat, Wuhan 430074, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2021年 / 23卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Speech recognition; Reliability; Training; Emotion recognition; Human computer interaction; Task analysis; Personality analysis; speech emotion recognition; acoustic features; annotation clustering; EMOTION REGULATION; TRAITS; PREDICTION; ROBUST; MODEL; IMPRESSIONS; RELIABILITY; LIKABILITY; CONTINUITY; SUPPORT;

D O I：

10.1109/TMM.2020.3025108

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech personality recognition relies on training models that require an excessive number of features and are, in most cases, designed specifically for certain databases. As a result, when tested on different datasets, overfitted classifier models are not always reliable because their accuracy changes with changes in the domain of speakers. Moreover, personality annotations are often subjective, which creates variability in raters perception during labeling. These problems inhibit the effectiveness of speech personality recognition applications. To reduce the unexplained variance caused by unknown differences in raters perception, a structure that uses Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm is proposed. Furthermore, a feature extraction method is proposed to filter out undesirable adulterations be it noise, silence, or uncertain pitch segments, while extracting essential audio features, i.e., signal power roll-off, pitch, and pause rate. Experiments on the standard SSPNet dataset records a relative 4% increase in overall accuracy when log-likelihood based annotations are used. Moreover, improved consistency in accuracy is observed when this method is tested on male and female subsets.

引用

页码：3414 / 3426

页数：13

共 2 条

[1] Multi-Features Audio Extraction for Speech Emotion Recognition Based on Deep Learning
Gondohanindijo, Jutono
Muljono
Noersasongko, Edi
Pujiono
Setiadi, De Rosal Moses
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 198 - 206
[2] Odia Running Text Recognition Using Moment-Based Feature Extraction and Mean Distance Classification Technique
Nayak, Mamata
Nayak, Ajit Kumar
INTELLIGENT COMPUTING, COMMUNICATION AND DEVICES, 2015, 309 : 497 - 506

← 1 →