Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

被引：0

作者：

Blatt, Alexander ^{[1
]}

Krishnan, Aravind ^{[1
]}

Klakow, Dietrich ^{[1
]}

机构：

[1] Saarland Univ, Saarland Informat Campus, Saarbrucken, Germany

来源：

INTERSPEECH 2024 | 2024年

关键词：

diarization; speech recognition; speaker role detection; air-traffic control;

D O I：

10.21437/Interspeech.2024-1987

中图分类号：

学科分类号：

摘要：

Utilizing air-traffic control (ATC) data for downstream natural-language processing tasks requires preprocessing steps. Key steps are the transcription of the data via automatic speech recognition (ASR) and speaker diarization, respectively speaker role detection (SRD) to divide the transcripts into pilot and air-traffic controller (ATCO) transcripts. While traditional approaches take on these tasks separately, we propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture. We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets. Our study shows in which cases our joint system can outperform the two traditional approaches and in which cases the other architectures are preferable. We additionally evaluate how acoustic and lexical differences influence all architectures and show how to overcome them for our joint architecture.

引用

页码：3759 / 3763

页数：5

共 23 条

[1] Automatic Speech Recognition Benchmark for Air-Traffic Communications
Zuluaga-Gomez, Juan
Motlicek, Petr
Zhan, Qingran
Vesely, Karel
Braun, Rudolf
INTERSPEECH 2020, 2020, : 2297 - 2301
[2] BERTRAFFIC: BERT-BASED JOINT SPEAKER ROLE AND SPEAKER CHANGE DETECTION FOR AIR TRAFFIC CONTROL COMMUNICATIONS
Zuluaga-Gomez, Juan
Sarfjoo, Seyyed Saeed
Prasad, Amrutha
Nigmatulina, Iuliia
Motlicek, Petr
Ondrej, Karel
Ohneiser, Oliver
Helmke, Hartmut
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 633 - 640
[3] The Airbus Air Traffic Control speech recognition 2018 challenge: towards ATC automatic transcription and call sign detection
Pellegrini, Thomas
Farinas, Jerome
Delpech, Estelle
Lancelot, Francois
INTERSPEECH 2019, 2019, : 2993 - 2997
[4] Iterative Learning of Speech Recognition Models for Air Traffic Control
Srinivasamurthy, Ajay
Motlicek, Petr
Singh, Mittul
Oualil, Youssef
Kleinert, Matthias
Ehr, Heiko
Helmke, Hartmut
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3519 - 3523
[5] A TWO-STEP APPROACH TO LEVERAGE CONTEXTUAL DATA: SPEECH RECOGNITION IN AIR-TRAFFIC COMMUNICATIONS
Nigmatulina, Iuliia
Zuluaga-Gomez, Juan
Prasad, Amrutha
Sarfjoo, Seyyed Saeed
Motlicek, Petr
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6282 - 6286
[6] Ensuring Safety for Artificial-Intelligence-Based Automatic Speech Recognition in Air Traffic Control Environment
Pinska-Chauvin, Ella
Helmke, Hartmut
Dokic, Jelena
Hartikainen, Petri
Ohneiser, Oliver
Lasheras, Raquel Garcia
AEROSPACE, 2023, 10 (11)
[7] Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation
Wang, Zhuang
Jiang, Peiyuan
Wang, Zixuan
Han, Boyuan
Liang, Haijun
Ai, Yi
Pan, Weijun
SENSORS, 2024, 24 (14)
[8] A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
Lin, Yi
Guo, Dongyue
Zhang, Jianwei
Chen, Zhengmao
Yang, Bo
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3608 - 3620
[9] Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control
Zhang, Shiyu
Kong, Jianguo
Chen, Chao
Li, Yabin
Liang, Haijun
AEROSPACE, 2022, 9 (08)
[10] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
Moriya, Takafumi
Sato, Hiroshi
Ochiai, Tsubasa
Delcroix, Marc
Shinozaki, Takahiro
IEEE ACCESS, 2023, 11 : 13906 - 13917

← 1 2 3 →