Air traffic control speech recognition system cross-task & speaker adaptation

被引：12

作者：

de Cordoba, R. ^{[1
]}

Ferreiros, J. ^{[1
]}

San-Segundo, R. ^{[1
]}

Macias-Guarasa, J. ^{[1
]}

Montero, J. M. ^{[1
]}

Fernandez, F. ^{[1
]}

D'Haro, L. F. ^{[1
]}

Pardo, J. M. ^{[1
]}

机构：

[1] Univ Politecn Madrid, Speech Technol Grp, Dept Elect Engn, ETSI Telecomunicat, E-28040 Madrid, Spain

来源：

IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE | 2006年 / 21卷 / 09期

关键词：

D O I：

10.1109/MAES.2006.1705165

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

We present an overview of the most common techniques used in automatic speech recognition to adapt a general system to a different environment (known as cross-task adaptation) such as in an air traffic control system (ATC). The conditions present in ATC are very specific: very spontaneous, the presence of noise, and high speed speech. So, with a typical speech recognizer the recognition results are unsatisfactory. We have to decide on the best option for the modeling: to develop acoustic models specific to those conditions from scratch using the data available for the new environment, or to carry out cross-task adaptation starting from reliable MUM models (usually requiring less data in the target domain). We begin with a description of the main techniques considered for cross-task adaptation, namely Maximum A Posteriori (MAP), Maximum Likelihood Linear Regression (MLLR), and the two together. We have applied each in two speech recognizers for air traffic. control tasks, one for spontaneous speech and the other for a command interface. We show the performance of these techniques and compare them with the development of a new system from scratch. We also show the results obtained for speaker adaptation using a variable amount of adaptation data. The main conclusion is that MLLR can outperform MAP when a large number of transforms is used, and MLLR followed by MAP is the best option. All of these techniques are better than developing a new system from scratch, showing the effectiveness of mean and variance adaptation.

引用

页码：12 / 17

页数：6

共 50 条

[21] Improved cross-task recognition using MMIE training [J].

Córdoba, R ;

Woodland, PC ;

Gales, MJF .

2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, :85-88

[22] A CONTEXT-AWARE SPEECH RECOGNITION AND UNDERSTANDING SYSTEM FOR AIR TRAFFIC CONTROL DOMAIN [J].

Oualil, Youssef ;

Klakow, Dietrich ;

Szaszak, Gyoergy ;

Srinivasamurthy, Ajay ;

Helmke, Hartmut ;

Motlicek, Petr .

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, :404-408

[23] Speaker adaptation method based on eigenphone speaker subspace for speech recognition [J].

Qu, Dan ;

Zhang, Wen-Lin .

Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2015, 37 (06) :1350-1356

[24] A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition [J].

Rodríguez, LJ ;

Torres, MI .

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 :433-440

[25] Iterative Learning of Speech Recognition Models for Air Traffic Control [J].

Srinivasamurthy, Ajay ;

Motlicek, Petr ;

Singh, Mittul ;

Oualil, Youssef ;

Kleinert, Matthias ;

Ehr, Heiko ;

Helmke, Hartmut .

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :3519-3523

[26] AUTOMATIC SPEECH SEMANTIC RECOGNITION AND VERIFICATION IN AIR TRAFFIC CONTROL [J].

Johnson, Daniel R. ;

Nenov, Val I. ;

Espinoza, Gustavo .

2013 IEEE/AIAA 32ND DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2013,

[27] Automatic Speech Semantic Recognition and Verification in Air Traffic Control [J].

Johnson, Daniel R. ;

Nenov, Val .

2013 IEEE/AIAA 32ND DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2013,

[28] Channel and speaker adaptation techniques for robust speech recognition [J].

Chen, Jingdong ;

Yao, Lei ;

Huang, Taiyi .

Shengxue Xuebao/Acta Acustica, 1998, 23 (06) :537-544

[29] A Combined Speaker Adaptation Method for Mandarin Speech Recognition [J].

徐向华 ;

朱杰 .

JournalofShanghaiJiaotongUniversity, 2004, (04) :21-24

[30] Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition [J].

Hu, Di ;

Li, Xuhong ;

Mou, Lichao ;

Jin, Pu ;

Chen, Dong ;

Jing, Liping ;

Zhu, Xiaoxiang ;

Dou, Dejing .

COMPUTER VISION - ECCV 2020, PT XXIV, 2020, 12369 :68-84

← 1 2 3 4 5 →