Air traffic control speech recognition system cross-task & speaker adaptation

被引:12
作者
de Cordoba, R. [1 ]
Ferreiros, J. [1 ]
San-Segundo, R. [1 ]
Macias-Guarasa, J. [1 ]
Montero, J. M. [1 ]
Fernandez, F. [1 ]
D'Haro, L. F. [1 ]
Pardo, J. M. [1 ]
机构
[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomunicat, E-28040 Madrid, Spain
关键词
D O I
10.1109/MAES.2006.1705165
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
We present an overview of the most common techniques used in automatic speech recognition to adapt a general system to a different environment (known as cross-task adaptation) such as in an air traffic control system (ATC). The conditions present in ATC are very specific: very spontaneous, the presence of noise, and high speed speech. So, with a typical speech recognizer the recognition results are unsatisfactory. We have to decide on the best option for the modeling: to develop acoustic models specific to those conditions from scratch using the data available for the new environment, or to carry out cross-task adaptation starting from reliable MUM models (usually requiring less data in the target domain). We begin with a description of the main techniques considered for cross-task adaptation, namely Maximum A Posteriori (MAP), Maximum Likelihood Linear Regression (MLLR), and the two together. We have applied each in two speech recognizers for air traffic. control tasks, one for spontaneous speech and the other for a command interface. We show the performance of these techniques and compare them with the development of a new system from scratch. We also show the results obtained for speaker adaptation using a variable amount of adaptation data. The main conclusion is that MLLR can outperform MAP when a large number of transforms is used, and MLLR followed by MAP is the best option. All of these techniques are better than developing a new system from scratch, showing the effectiveness of mean and variance adaptation.
引用
收藏
页码:12 / 17
页数:6
相关论文
共 50 条
[41]   Cross-Task Inconsistency Based Active Learning (CTIAL) for Emotion Recognition [J].
Xu, Yifan ;
Jiang, Xue ;
Wu, Dongrui .
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) :1659-1668
[42]   Transferable Modulation of Cognitive Control: The Cross-Task Role of Conflict Adaptation in Thematic Roles Assignment in Chinese [J].
Luo, Jiefei ;
Cheng, Qi ;
Zhang, Mengfang ;
Wu, Yan .
BEHAVIORAL SCIENCES, 2025, 15 (07)
[43]   Digital Speech Watermarking for Authenticity of Speaker in Speaker Recognition System [J].
Desai, Nihalkumar ;
Tahilramani, Nikunj .
2016 INTERNATIONAL CONFERENCE ON MICRO-ELECTRONICS AND TELECOMMUNICATION ENGINEERING (ICMETE), 2016, :105-109
[44]   Effects of orthographic neighborhood in visual word recognition: Cross-task comparisons [J].
Carreiras, M ;
Perea, M ;
Grainger, J .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1997, 23 (04) :857-871
[45]   Multi-task Recurrent Model for Speech and Speaker Recognition [J].
Tang, Zhiyuan ;
Li, Lantian ;
Wang, Dong .
2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[46]   Towards multi-task learning of speech and speaker recognition [J].
Vaessen, Nik ;
van Leeuwen, David A. .
INTERSPEECH 2023, 2023, :4898-4902
[47]   Speaker adaptation of fuzzy-perceptron-based speech recognition [J].
Lin, CT ;
Nein, HW ;
Lin, WF .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 1999, 7 (01) :1-30
[48]   Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems [J].
Siniscalchi, Sabato Marco ;
Li, Jinyu ;
Lee, Chin-Hui .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10) :2152-2161
[49]   Speaker adaptation for hybrid MMI/connectionist speech recognition systems [J].
Rottland, J ;
Neukirchen, C ;
Rigoll, G .
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, :465-468
[50]   Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition [J].
Hayashi, Toyohiro ;
Nankaku, Yoshihiko ;
Lee, Akinobu ;
Tokuda, Keiichi .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, :542-545