Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings

被引:207
作者
Lotfian, Reza [1 ]
Busso, Carlos [1 ]
机构
[1] Univ Texas Dallas, Erik Jonsson Sch Engn & Comp Sci, Richardson, TX 75080 USA
基金
美国国家科学基金会;
关键词
Databases; Speech; Speech recognition; Digital audio broadcasting; Speech processing; Emotion recognition; Machine learning algorithms; Affective corpus; emotion recognition; expressive speech; information retrieval; emotion ranking; RECOGNITION; SYSTEM;
D O I
10.1109/TAFFC.2017.2736999
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The lack of a large, natural emotional database is one of the key barriers to translate results on speech emotion recognition in controlled conditions into real-life applications. Collecting emotional databases is expensive and time demanding, which limits the size of existing corpora. Current approaches used to collect spontaneous databases tend to provide unbalanced emotional content, which is dictated by the given recording protocol (e.g., positive for colloquial conversations, negative for discussion or debates). The size and speaker diversity are also limited. This paper proposes a novel approach to effectively build a large, naturalistic emotional database with balanced emotional content, reduced cost and reduced manual labor. It relies on existing spontaneous recordings obtained from audio-sharing websites. The proposed approach combines machine learning algorithms to retrieve recordings conveying balanced emotional content with a cost effective annotation process using crowdsourcing, which make it possible to build a large scale speech emotional database. This approach provides natural emotional renditions from multiple speakers, with different channel conditions and conveying balanced emotional content that are difficult to obtain with alternative data collection protocols.
引用
收藏
页码:471 / 483
页数:13
相关论文
共 64 条
[1]  
[Anonymous], 2009, ACM SIGKDD explorations newsletter, DOI 10.1145/1656274.1656278
[2]  
[Anonymous], 2006, P ACMSIGKDD INT C KN
[3]  
[Anonymous], INT C LANG RES EV LI
[4]  
[Anonymous], P 1 INT WORKSH EM CO
[5]  
[Anonymous], 2000, THESIS
[6]  
[Anonymous], 2016, INTERSPEECH, DOI DOI 10.21437/Interspeech.2016-429
[7]  
[Anonymous], 2005, P INT SEP
[8]  
[Anonymous], 2 INT WORKSH EM REPR
[9]  
[Anonymous], 2012, P INTERSPEECH
[10]  
[Anonymous], 2000, P ISCA TUT RES WORKS