THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引:1
作者
Tian, Jingguang [1 ]
Ye, Shuaishuai [1 ]
Chen, Shunfei [1 ]
Xiang, Yang [1 ]
Yin, Zhaohui [1 ]
Hu, Xinhui [1 ]
Xu, Xinkang [1 ]
机构
[1] Hithink RoyalFlush AI Res Inst, Hangzhou, Zhejiang, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年
关键词
ICMC-ASR; ASDR; TS-VAD; speaker diarization; speech recognition;
D O I
10.1109/ICASSPW62465.2024.10626136
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58% compared to the official baseline on the development set. For speech recognition, we utilize self-supervised learning representations to train end-to-end ASR models. By integrating these models, we achieve a character error rate (CER) of 16.93% on the track 1 evaluation set, and a concatenated minimum permutation character error rate (cpCER) of 25.88% on the track 2 evaluation set.
引用
收藏
页码:1 / 2
页数:2
相关论文
共 13 条
[11]  
Xiang Y, 2024, ICASSP
[12]  
Yin ZH, 2023, Arxiv, DOI arXiv:2308.05987
[13]   WENETSPEECH: A 10000+HOURS MULTI-DOMAIN MANDARIN CORPUS FOR SPEECH RECOGNITION [J].
Zhang, Binbin ;
Lv, Hang ;
Guo, Pengcheng ;
Shao, Qijie ;
Yang, Chao ;
Xie, Lei ;
Xu, Xin ;
Bu, Hui ;
Chen, Xiaoyu ;
Zeng, Chenchen ;
Wu, Di ;
Peng, Zhendong .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6182-6186