ICMC-ASR: THE ICASSP 2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引：1

作者：

Wang, He ^{[1
]}

Guo, Pengcheng ^{[1
]}

Li, Yue ^{[1
]}

Zhang, Ao ^{[1
]}

Sun, Jiayao ^{[1
]}

Xie, Lei ^{[1
]}

Chen, Wei ^{[2
]}

Zhou, Pan ^{[2
]}

Bu, Hui ^{[3
]}

Xu, Xin ^{[3
]}

Zhang, Binbin ^{[4
]}

Chen, Zhuo ^{[5
]}

Wu, Jian ^{[6
]}

Wang, Longbiao ^{[7
]}

Chng, Eng Siong ^{[8
]}

Li, Sun ^{[9
]}

机构：

[1] Northwestern Polytech Univ, Xian, Peoples R China

[2] Space AI, LI Auto, Chengdu, Peoples R China

[3] Beijing AI Shell Technol Co Ltd, Beijing, Peoples R China

[4] WeNet Open Source Community, Shanghai, Peoples R China

[5] ByteDance, Beijing, Peoples R China

[6] Microsoft Corp, Redmond, WA USA

[7] Tianjin Univ, Tianjin, Peoples R China

[8] Nanyang Technol Univ, Singapore, Singapore

[9] China Acad Informat & Commun Technol, Beijing, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年

关键词：

Multi-channel; Automatic Speech Recognition;

D O I：

10.1109/ICASSPW62465.2024.10627712

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTC-iflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.

引用

页码：63 / 64

页数：2

共 13 条

[1]

Baevski Alexei, P MACHINE LEARNING R

[2] ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].

Desplanques, Brecht ;

Thienpondt, Jenthe ;

Demuynck, Kris .

INTERSPEECH 2020, 2020, :3830-3834

[3]

Guo Pengcheng, 2023, P ICASSP, P1

[4] HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [J].

Hsu, Wei-Ning ;

Bolte, Benjamin ;

Tsai, Yao-Hung Hubert ;

Lakhotia, Kushal ;

Salakhutdinov, Ruslan ;

Mohamed, Abdelrahman .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3451-3460

[5]

Lu Ye-Xin, 2023, Mp-senet: A speech enhancement model with parallel denoising of magnitude and phase spectra

[6]

Medennikov Ivan, 2020, Target-speaker voice activity detection: a novel approach for multispeaker diarization in a dinner party scenario, P274

[7]

RuoyuWang Maokui He, 2023, The ustc-nercslip systems for the chime-7 dasr challenge

[8]

Tian JG, 2022, Arxiv, DOI arXiv:2202.04814

[9]

Wang H, 2023, Arxiv, DOI arXiv:2303.00332

[10]

Xiang Y, 2023, Arxiv, DOI arXiv:2312.09620

← 1 2 →