FaciaVox: A diverse multimodal biometric dataset of facial images and voice recordings

被引：0

作者：

Abuqaaud, Kamal ^{[1
,2
]}

Nassif, Ali Bou ^{[3
]}

Shahin, Ismail ^{[1
]}

机构：

[1] Univ Sharjah, Dept Elect Engn, Sharjah 27272, U Arab Emirates

[2] Higher Coll Technol, Dept Elect Engn, Sharjah, U Arab Emirates

[3] Univ Sharjah, Dept Comp Engn, Sharjah 27272, U Arab Emirates

来源：

DATA IN BRIEF | 2025年 / 60卷

关键词：

Biometrics; Cloning; Face recognition; Feature fusion; Masked faces; Multimodality; Speaker identification;

D O I：

10.1016/j.dib.2025.111489

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

FaciaVox is a multimodal biometric dataset that consists of face images and voice recordings under both masked and unmasked conditions. The term "FaciaVox" is strategically chosen to create a distinct and easily memorable name. This name selection serves to highlight the dataset's multimodal characteristics, as well as its relevance to biometric recognition tasks. The FaciaVox dataset consists of contributions from 100 participants from 20 different countries, each providing 18 facial images and 60 audio recordings. The facial images are stored in JPG format, while the audio recordings are saved as WAV files, ensuring compatibility with standard processing tools. Participants are categorized by age into four distinct groups: Group 1 includes individuals below 16 years of age; Group 2 corresponds to those aged 16 up to less than 31; Group 3 encompasses participants aged 31 up to less than 46; and Group 4 represents individuals aged 46 and above. The data collection was conducted in two distinct environments: a professional soundproof studio and a conventional classroom. While the studio provided a controlled setting, the classroom introduced variables such as echo and sound reflections. Some participants were recorded in the studio, while others were recorded in the classroom, as detailed in the file named 'FaciaVox list' which specifies where each participant was recorded. Participants were positioned at 70 100 cm from the iPhone's rear camera, utilizing three specific zoom levels (1x, 3x, and 5x) to obtain a collection of facial photos. Each participant submitted a total of 18 facial photos, comprising six different images captured at each magnification level. The six different images encompassed a sequence of conditions: the initial set was captured without the use of a face mask, followed by subsequent images where participants donned a disposable mask, transitioned to a reusable mask, then advanced to a dual-layer cloth mask. Subsequently, a silicon face shield was introduced along with the cloth mask, concluding in final images where the silicon shield was worn independently. Each participant was instructed to speak ten sentences, switching between English and Arabic, under the six previously mentioned conditions. The speech was recorded using the Zoom H6 Handy Recorder. The FaciaVox dataset provides an extensive range of study options in the fields of face images and audio signals with and without face mask. This broad dataset serves as a foundational resource for investigating a wide range of cutting-edge applications, including but not limited to multimodal biometrics, cross-domain biometric fusion, age and gender estimation, human-machine interaction, deep learning, speech intelligence, voice cloning, image inpainting, and security and surveillance. (c) 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/)

引用

页数：12

共 21 条

[1]

Abuqaaud K., 2025, **DATA OBJECT**, DOI 10.5281/zenodo.14861092

[2] Face-voice based multimodal biometric authentication system via FaceNet and GMM [J].

Alharbi, Bayan ;

Alshanbari, Hanan S. .

PEERJ COMPUTER SCIENCE, 2023, 9

[3] The effect of wearing face mask on speech intelligibility in listeners with sensorineural hearing loss and normal hearing sensitivity [J].

Alkharabsheh, Ana'am ;

Aboudi, Ola ;

Abdulbaqi, Khader ;

Garadat, Soha .

INTERNATIONAL JOURNAL OF AUDIOLOGY, 2023, 62 (04) :328-333

[4]

[Anonymous], English-Languages of the World

[5]

Arik SÖ, 2018, ADV NEUR IN, V31

[6]

Chung J.S., 2018, arXiv, DOI [10.48550/arXiv.1806.05622, DOI 10.48550/ARXIV.1806.05622]

[7]

Damer N, 2020, Biometrics Special I, VP-306

[8]

Du ZH, 2024, Arxiv, DOI arXiv:2407.05407

[9]

Farhadipour A, 2024, Arxiv, DOI arXiv:2409.00562

[10] Emotional face recognition when a colored mask is worn: a cross-sectional study [J].

Gil, Sandrine ;

Le Bigot, Ludovic .

SCIENTIFIC REPORTS, 2023, 13 (01)

← 1 2 3 →