Identifying Conflict Escalation and Primates by Using Ensemble X-vectors and Fisher Vector Features

被引:7
作者
Egas-Lopez, Jose Vicente [1 ]
Vetrab, Mercedes [1 ]
Toth, Laszlo [1 ]
Gosztolya, Gabor [1 ,2 ]
机构
[1] Univ Szeged, Inst Informat, Szeged, Hungary
[2] ELRN, MTA SZTE Res Grp Artificial Intelligence, Szeged, Hungary
来源
INTERSPEECH 2021 | 2021年
关键词
human-computer interaction; computational paralinguistics; x-vectors; Fisher vectors; ensemble learning; OF-AUDIO-WORDS; STYRIAN DIALECTS; CLASSIFICATION; REPRESENTATION; SLEEPINESS; SPEECH; SOUNDS; BABY;
D O I
10.21437/Interspeech.2021-1173
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Computational paralinguistics is concerned with the automatic identification of non-verbal information in human speech. The Interspeech ComParE challenge features new paralinguistic tasks each year; this time, among others, a cross-corpus conflict escalation task and the identification of primates based solely on audio are the actual problems set. In our entry to ComParE 2021, we utilize x-vectors and Fisher vectors as features. To improve the robustness of the predictions, we also experiment with building an ensemble of classifiers from the x-vectors. Lastly, we exploit the fact that the Escalation Sub-Challenge is a conflict detection task, and incorporate the SSPNet Conflict Corpus in our training workflow. Using these approaches, at the time of writing, we had already surpassed the official Challenge baselines on both tasks, which demonstrates the efficiency of the employed techniques.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 33 条
[1]  
[Anonymous], 2010, P 18 ACM INT C MULT
[2]  
[Anonymous], 2011, P INTERSPEECH
[3]  
[Anonymous], 2011, P ASRU
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   Front-End Factor Analysis for Speaker Verification [J].
Dehak, Najim ;
Kenny, Patrick J. ;
Dehak, Reda ;
Dumouchel, Pierre ;
Ouellet, Pierre .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798
[6]  
Egas-Lopez J. V., 2021, P ICASSP
[7]  
Eyben F., 2013, P 21 ACM INT C MULT, P835, DOI 10.1145/2502081.2502224
[8]   Very Short-term Conflict Intensity Estimation Using Fisher Vectors [J].
Gosztolya, Gabor .
INTERSPEECH 2020, 2020, :3127-3131
[9]   Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy [J].
Gosztolya, Gabor ;
Busa-Fekete, Robert .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :477-488
[10]   Using the Fisher Vector Representation for Audio-based Emotion Recognition [J].
Gosztolya, Gabor .
ACTA POLYTECHNICA HUNGARICA, 2020, 17 (06) :7-23