Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

被引:0
|
作者
Blatt, Alexander [1 ]
Krishnan, Aravind [1 ]
Klakow, Dietrich [1 ]
机构
[1] Saarland Univ, Saarland Informat Campus, Saarbrucken, Germany
来源
INTERSPEECH 2024 | 2024年
关键词
diarization; speech recognition; speaker role detection; air-traffic control;
D O I
10.21437/Interspeech.2024-1987
中图分类号
学科分类号
摘要
Utilizing air-traffic control (ATC) data for downstream natural-language processing tasks requires preprocessing steps. Key steps are the transcription of the data via automatic speech recognition (ASR) and speaker diarization, respectively speaker role detection (SRD) to divide the transcripts into pilot and air-traffic controller (ATCO) transcripts. While traditional approaches take on these tasks separately, we propose a transformer-based joint ASR-SRD system that solves both tasks jointly while relying on a standard ASR architecture. We compare this joint system against two cascaded approaches for ASR and SRD on multiple ATC datasets. Our study shows in which cases our joint system can outperform the two traditional approaches and in which cases the other architectures are preferable. We additionally evaluate how acoustic and lexical differences influence all architectures and show how to overcome them for our joint architecture.
引用
收藏
页码:3759 / 3763
页数:5
相关论文
共 23 条
  • [1] Automatic Speech Recognition Benchmark for Air-Traffic Communications
    Zuluaga-Gomez, Juan
    Motlicek, Petr
    Zhan, Qingran
    Vesely, Karel
    Braun, Rudolf
    INTERSPEECH 2020, 2020, : 2297 - 2301
  • [2] BERTRAFFIC: BERT-BASED JOINT SPEAKER ROLE AND SPEAKER CHANGE DETECTION FOR AIR TRAFFIC CONTROL COMMUNICATIONS
    Zuluaga-Gomez, Juan
    Sarfjoo, Seyyed Saeed
    Prasad, Amrutha
    Nigmatulina, Iuliia
    Motlicek, Petr
    Ondrej, Karel
    Ohneiser, Oliver
    Helmke, Hartmut
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 633 - 640
  • [3] The Airbus Air Traffic Control speech recognition 2018 challenge: towards ATC automatic transcription and call sign detection
    Pellegrini, Thomas
    Farinas, Jerome
    Delpech, Estelle
    Lancelot, Francois
    INTERSPEECH 2019, 2019, : 2993 - 2997
  • [4] Iterative Learning of Speech Recognition Models for Air Traffic Control
    Srinivasamurthy, Ajay
    Motlicek, Petr
    Singh, Mittul
    Oualil, Youssef
    Kleinert, Matthias
    Ehr, Heiko
    Helmke, Hartmut
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3519 - 3523
  • [5] A TWO-STEP APPROACH TO LEVERAGE CONTEXTUAL DATA: SPEECH RECOGNITION IN AIR-TRAFFIC COMMUNICATIONS
    Nigmatulina, Iuliia
    Zuluaga-Gomez, Juan
    Prasad, Amrutha
    Sarfjoo, Seyyed Saeed
    Motlicek, Petr
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6282 - 6286
  • [6] Ensuring Safety for Artificial-Intelligence-Based Automatic Speech Recognition in Air Traffic Control Environment
    Pinska-Chauvin, Ella
    Helmke, Hartmut
    Dokic, Jelena
    Hartikainen, Petri
    Ohneiser, Oliver
    Lasheras, Raquel Garcia
    AEROSPACE, 2023, 10 (11)
  • [7] Enhancing Air Traffic Control Communication Systems with Integrated Automatic Speech Recognition: Models, Applications and Performance Evaluation
    Wang, Zhuang
    Jiang, Peiyuan
    Wang, Zixuan
    Han, Boyuan
    Liang, Haijun
    Ai, Yi
    Pan, Weijun
    SENSORS, 2024, 24 (14)
  • [8] A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
    Lin, Yi
    Guo, Dongyue
    Zhang, Jianwei
    Chen, Zhengmao
    Yang, Bo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3608 - 3620
  • [9] Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control
    Zhang, Shiyu
    Kong, Jianguo
    Chen, Chao
    Li, Yabin
    Liang, Haijun
    AEROSPACE, 2022, 9 (08)
  • [10] Streaming End-to-End Target-Speaker Automatic Speech Recognition and Activity Detection
    Moriya, Takafumi
    Sato, Hiroshi
    Ochiai, Tsubasa
    Delcroix, Marc
    Shinozaki, Takahiro
    IEEE ACCESS, 2023, 11 : 13906 - 13917