Vietnam-Celeb: a large-scale dataset for Vietnamese speaker recognition

被引：1

作者：

Pham Viet Thanh ^{[1
]}

Nguyen Xuan Thai Hoa ^{[1
]}

Hoang Long Vu ^{[1
]}

Nguyen Thi Thu Trang ^{[1
]}

机构：

[1] Hanoi Univ Sci & Technol, Hanoi, Vietnam

来源：

INTERSPEECH 2023 | 2023年

关键词：

speaker recognition; speaker verification; vietnamese dataset;

D O I：

10.21437/Interspeech.2023-1989

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The success of speaker recognition systems heavily depends on large training datasets collected under real-world conditions. While common languages like English or Chinese have vastly available datasets, low-resource ones like Vietnamese remain limited. This paper presents a large-scale spontaneous dataset gathered under noisy environments, with over 87, 000 utterances from 1, 000 Vietnamese speakers of many professions, covering 3 main Vietnamese dialects. To build the dataset, we propose a sophisticated construction pipeline that can also be applied to other languages, with efficient visual-aided processing techniques to boost data precision. With the state-of-the-art x-vector model, training with the proposed dataset shows an average absolute and relative EER improvement of 5.48% and 41.61% when compared to the model trained on VLSP 2021, a publicly available Vietnamese speaker dataset.

引用

页码：1918 / 1922

页数：5

共 17 条

[1]

[Anonymous], 2018, Arcface: Additive angular margin loss for deep face recognition

[2]

Baevski A, 2020, ADV NEUR IN, V33

[3]

Das R. K., 2021, ARXIV211106671

[4]

Dat V. T., 2022, VNU J SCI COMPUTER S, V38

[5]

Deng Jiankang, 2019, arXiv

[6] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].

Desplanques, Brecht ;

Thienpondt, Jenthe ;

Demuynck, Kris .

INTERSPEECH 2020, 2020, :3830-3834

[7]

Fan Y, 2020, INT CONF ACOUST SPEE, P7604, DOI [10.1109/icassp40776.2020.9054017, 10.1109/ICASSP40776.2020.9054017]

[8]

Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152

[9]

Li L., 2020, CN-Celeb: multi-genre speaker recognition

[10] The Speakers in the Wild (SITW) Speaker Recognition Database [J].

McLaren, Mitchell ;

Ferrer, Luciana ;

Castan, Diego ;

Lawson, Aaron .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :818-822

← 1 2 →