Non-Intrusive Signal Analysis for Room Adaptation of ASR Models

被引：0

作者：

Li, Ge ^{[1
]}

Sharma, Dushyant ^{[2
]}

Naylor, Patrick A. ^{[3
]}

机构：

[1] Nuance Commun, Montreal, PQ, Canada

[2] Nuance Commun Inc, Burlington, MA USA

[3] Imperial Coll London, London, England

来源：

2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022) | 2022年

关键词：

SPEECH QUALITY;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a new deep-learning-based non-intrusive signal assessment method (NISA+) that performs a joint estimation of a large set of speech signal parameters, including those related to reverberation (C-50, DRR, reflection coefficient and room volume), background noise (SNR), perceptual speech quality (PESQ), speech intelligibility (ESTOI), voice activity detection, and speech coding (codec presence and bitrate). We show that neural embedding based combination of spectral features with an LSTM and modulation features with a convolution neural network enable NISA+ to achieve state of the art performance. Particularly, for non-intrusive PESQ and C-50 estimation, we show around 15% relative reduction in estimation error compared to our previous best results. We also show that NISA+ can be used to perform targeted data augmentation for generating training data for ASR that matches the signal characteristics extracted from a small sample of data recorded in a target room acoustic environment. We show that a 9.6% word error rate reduction can be achieved relative to an ASR model trained with random augmentation.

引用

页码：130 / 134

页数：5

共 34 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[2]

[Anonymous], 1994, CI WSJ1 COMPLETE

[3]

[Anonymous], 2001, ITU-T Rec. P. 862

[4] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[5] Estimation of Room Acoustic Parameters: The ACE Challenge [J].

Eaton, James ;

Gaubitch, Nikolay D. ;

Moore, Alastair H. ;

Naylor, Patrick A. .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) :1681-1693

[6] The Modulation Transfer Function for Speech Intelligibility [J].

Elliott, Taffeta M. ;

Theunissen, Frederic E. .

PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (03)

[7]

Feifei Xiong, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5522, DOI 10.1109/ICASSP.2014.6854659

[8] Predicting Automatic Speech Recognition Performance over Communication Channels from Instrumental Speech Quality and Intelligibility Scores [J].

Gallardo, Laura Fernandez ;

Moeller, Sebastian ;

Beerends, John .

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, :2939-2943

[9]

Gamper H, 2018, INT WORKSH ACOUSTIC, P136, DOI 10.1109/IWAENC.2018.8521241

[10]

Gong R., 2021, P INT

← 1 2 3 4 →