Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

被引：0

作者：

Padmanabhan, M ^{[1
]}

Saon, G ^{[1
]}

Zweig, G ^{[1
]}

Huang, J ^{[1
]}

Kingsbury, B ^{[1
]}

Mangu, L ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS | 2001年

关键词：

speech recognition; spontaneous speech; telephone speech; discriminant transforms; boosting; consensus; formant frequencies; spectral peaks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Research in the speech recognition speech-to-text conversion) area has been underway for a couple of decades, and a great deal of progress has been made in reducing the word error rate (WER). In this paper, we attempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extraction, classifier design, combination of hypotheses, and speaker adaptation of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversational telephone speech between two speakers). We also present the results of some cross-domain experiments which highlight the "brittleness" of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain performance.

引用

页码：1926 / 1931

页数：4

共 50 条

[31] The AhoSR Automatic Speech Recognition System
Odriozola, Igor
Serrano, Luis
Hernaez, Inma
Navas, Eva
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 279 - 288
[32] EARS: Electromyographical automatic recognition of speech
Jou, Szu-Chen Stan
Schultz, Tanja
BIOSIGNALS 2008: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, VOL 1, 2008, : 3 - +
[33] Automatic Recognition of Anger in Spontaneous Speech
Neiberg, Daniel
Elenius, Kjell
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2755 - 2758
[34] Automatic intelligibility assessment of pathologic speech over the telephone
Haderlein, Tino
Noeth, Elmar
Batliner, Anton
Eysholdt, Ulrich
Rosanowski, Frank
LOGOPEDICS PHONIATRICS VOCOLOGY, 2011, 36 (04) : 175 - 181
[35] Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition
Fukuda, Takashi
Ichikawa, Osamu
Nishimura, Masafumi
SPEECH COMMUNICATION, 2018, 98 : 95 - 103
[36] Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Ni, Junrui
Wang, Liming
Gao, Heting
Qian, Kaizhi
Zhang, Yang
Chang, Shiyu
Hasegawa-Johnson, Mark
INTERSPEECH 2022, 2022, : 461 - 465
[37] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
Mengistu, Kinfe Tadesse
Rudzicz, Frank
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
[38] A Study on Speech Coders for Automatic Speech Recognition in Adverse Communication Environments
Choi, Seung Ho
INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 67 - 75
[39] Auditory driven subband speech enhancement for automatic recognition of noisy speech
Upadhyay N.
Rosales H.G.
International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
[40] Handling of true and pseudo wideband speech signals in automatic speech recognition
Ben Salah, Mohamed-Ali
Monne, Jean
Jouvet, Denis
Andre-Obrecht, Regine
PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SIGNAL, SPEECH AND IMAGE PROCESSING (SSIP '08): SIGNAL, SPEECH AND IMAGE PROCESSING, 2008, : 39 - +

← 1 2 3 4 5 →