Vietnam-Celeb: a large-scale dataset for Vietnamese speaker recognition

被引:1
作者
Pham Viet Thanh [1 ]
Nguyen Xuan Thai Hoa [1 ]
Hoang Long Vu [1 ]
Nguyen Thi Thu Trang [1 ]
机构
[1] Hanoi Univ Sci & Technol, Hanoi, Vietnam
来源
INTERSPEECH 2023 | 2023年
关键词
speaker recognition; speaker verification; vietnamese dataset;
D O I
10.21437/Interspeech.2023-1989
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The success of speaker recognition systems heavily depends on large training datasets collected under real-world conditions. While common languages like English or Chinese have vastly available datasets, low-resource ones like Vietnamese remain limited. This paper presents a large-scale spontaneous dataset gathered under noisy environments, with over 87, 000 utterances from 1, 000 Vietnamese speakers of many professions, covering 3 main Vietnamese dialects. To build the dataset, we propose a sophisticated construction pipeline that can also be applied to other languages, with efficient visual-aided processing techniques to boost data precision. With the state-of-the-art x-vector model, training with the proposed dataset shows an average absolute and relative EER improvement of 5.48% and 41.61% when compared to the model trained on VLSP 2021, a publicly available Vietnamese speaker dataset.
引用
收藏
页码:1918 / 1922
页数:5
相关论文
共 17 条
[1]  
[Anonymous], 2018, Arcface: Additive angular margin loss for deep face recognition
[2]  
Baevski A, 2020, ADV NEUR IN, V33
[3]  
Das R. K., 2021, ARXIV211106671
[4]  
Dat V. T., 2022, VNU J SCI COMPUTER S, V38
[5]  
Deng Jiankang, 2019, arXiv
[6]   ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].
Desplanques, Brecht ;
Thienpondt, Jenthe ;
Demuynck, Kris .
INTERSPEECH 2020, 2020, :3830-3834
[7]  
Fan Y, 2020, INT CONF ACOUST SPEE, P7604, DOI [10.1109/icassp40776.2020.9054017, 10.1109/ICASSP40776.2020.9054017]
[8]  
Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152
[9]  
Li L., 2020, CN-Celeb: multi-genre speaker recognition
[10]   The Speakers in the Wild (SITW) Speaker Recognition Database [J].
McLaren, Mitchell ;
Ferrer, Luciana ;
Castan, Diego ;
Lawson, Aaron .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :818-822