The ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and Results

被引：2

作者：

Zhang, Ao ^{[1
]}

Yu, Fan ^{[1
]}

Huang, Kaixun ^{[1
]}

Xie, Lei ^{[1
]}

Wang, Longbiao ^{[2
]}

Chng, Eng Siong ^{[3
]}

Bu, Hui ^{[4
]}

Zhang, Binbin ^{[5
]}

Chen, Wei ^{[6
]}

Xu, Xin ^{[4
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China

[3] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

[4] Beijing Shell Shell Technol Co Ltd, Beijing, Peoples R China

[5] WeNet Open Source Community, Beijing, Peoples R China

[6] Li Auto Inc, Beijing, Peoples R China

来源：

2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2022年

关键词：

Automatic speech recognition; intelligent cockpit; in-vehicle speech recognition;

D O I：

10.1109/ISCSLP57327.2022.10037868

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper summarizes the outcomes from the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC). We first address the necessity of the challenge and then introduce the associated dataset collected from a new-energy vehicle (NEV) covering a variety of cockpit acoustic conditions and linguistic contents. We then describe the track arrangement and the baseline system. Specifically, we set up two tracks in terms of allowed model/system size to investigate resource-constrained and -unconstrained setups, targeting to vehicle embedded as well as cloud ASR systems respectively. Finally we summarize the challenge results and provide the major observations from the submitted systems.

引用

页码：507 / 511

页数：5

共 38 条

[1]

Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618

[2] Tools and Methodologies for Autonomous Driving Systems [J].

Bhat, Anand ;

Aoki, Shunsuke ;

Rajkumar, Ragunathan .

PROCEEDINGS OF THE IEEE, 2018, 106 (09) :1700-1716

[3]

Braun S, 2017, EUR SIGNAL PR CONF, P548, DOI 10.23919/EUSIPCO.2017.8081267

[4]

Bu H, 2017, 2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), P58, DOI 10.1109/ICSDA.2017.8384449

[5]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[6]

Dong LH, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5884, DOI 10.1109/ICASSP.2018.8462506

[7] Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion [J].

Duc Le ;

Jain, Mahaveer ;

Keren, Gil ;

Kim, Suyoun ;

Shi, Yangyang ;

Mahadeokar, Jay ;

Chan, Julian ;

Shangguan, Yuan ;

Fuegen, Christian ;

Kalinli, Ozlem ;

Saraf, Yatharth ;

Seltzer, Michael L. .

INTERSPEECH 2021, 2021, :1772-1776

[8] DEEP SHALLOW FUSION FOR RNN-T PERSONALIZATION [J].

Duc Le ;

Keren, Gil ;

Chan, Julian ;

Mahadeokar, Jay ;

Fuegen, Christian ;

Seltzer, Michael L. .

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, :251-257

[9] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) [J].

Fiscus, JG .

1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, :347-354

[10]

Graves A., 2006, P 23 INT C MACH LEAR, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]

← 1 2 3 4 →