KAN-AV dataset for audio-visual face and speech analysis in the wild

被引:1
作者
Kefalas, Triantafyllos [1 ]
Fotiadou, Eftychia [1 ]
Georgopoulos, Markos [1 ]
Panagakis, Yannis [2 ]
Ma, Pingchuan [1 ]
Petridis, Stavros [1 ]
Stafylakis, Themos [3 ]
Pantic, Maja [1 ]
机构
[1] Imperial Coll, Dept Comp, London, England
[2] Univ Athens, Dept Informat & Telecommun, Athens, Greece
[3] Omilia Conversat Intelligence, Athens, Greece
基金
英国工程与自然科学研究理事会;
关键词
KAN-AV; Speaker verification; Kinship verification; Age-invariant; Cross-modal matching; Audio-visual; RECOGNITION; REPRESENTATIONS; FAMILIES; DATABASE;
D O I
10.1016/j.imavis.2023.104839
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human-computer interaction is becoming increasingly prevalent in daily life with the adoption of intelligent devices. These devices must be capable of interacting in diverse settings, such as environments with noise, music and differing illumination and occlusion conditions. They must also interact with a variety of end users across ages and backgrounds. Therefore, the machine learning community needs in-the-wild multi-modal datasets to develop models for face and speech analysis so that they can be applicable in most real world scenarios. However, most existing audio and audio-visual databases are captured in controlled conditions with few or no age and kinship labels. In this paper, we introduce the KAN-AV dataset which contains 98 h of audio-visual data from 970 identities across ages. Two thirds of the identities have kin relations in the dataset. The dataset is manually annotated with labels for kinship, age, and gender and is intended to drive future research in face and speech analysis.
引用
收藏
页数:12
相关论文
共 89 条
[1]  
Afouras T, 2018, Arxiv, DOI arXiv:1809.00496
[2]  
[Anonymous], INTERNET MOVIE DATAB
[3]  
[Anonymous], 2015, BMVC
[4]  
Bernard D, 2017, IEEE SYS MAN CYBERN, P210, DOI 10.1109/SMC.2017.8122604
[5]  
Bhattacharjee U, 2016, 2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE)
[6]   Action Recognition with Dynamic Image Networks [J].
Bilen, Hakan ;
Fernando, Basura ;
Gavves, Efstratios ;
Vedaldi, Andrea .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) :2799-2813
[7]   A tutorial on text-independent speaker verification [J].
Bimbot, F ;
Bonastre, JF ;
Fredouille, C ;
Gravier, G ;
Magrin-Chagnolleau, I ;
Meignier, S ;
Merlin, T ;
Ortega-García, J ;
Petrovska-Delacrétaz, D ;
Reynolds, DA .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) :430-451
[8]  
Burkhardt F, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P1562
[9]  
catalog.ldc, Switchboard-1 release 2
[10]  
catalog.ldc.upenn, YOHO speaker verification