Turbo Processing for Speech Recognition

被引：2

作者：

Moon, Todd K. ^{[1
,2
]}

Gunther, Jacob H. ^{[1
,2
]}

Broadus, Cortnie ^{[3
]}

Hou, Wendy ^{[4
]}

Nelson, Nils ^{[3
]}

机构：

[1] Utah State Univ, Informat Dynam Lab, Logan, UT 84322 USA

[2] Utah State Univ, Dept Elect & Comp Engn, Logan, UT 84322 USA

[3] Utah State Univ, Dept Math, Logan, UT 84322 USA

[4] Yale Univ, Dept Math, New Haven, CT 06511 USA

来源：

IEEE TRANSACTIONS ON CYBERNETICS | 2014年 / 44卷 / 01期

关键词：

Human-machine interface; speech processing; turbo processing; HIDDEN MARKOV-MODELS; MAXIMIZATION;

D O I：

10.1109/TCYB.2013.2247593

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Speech recognition is a classic example of a human/machine interface, typifying many of the difficulties and opportunities of human/machine interaction. In this paper, speech recognition is used as an example of applying turbo processing principles to the general problem of human/machine interface. Speech recognizers frequently involve a model representing phonemic information at a local level, followed by a language model representing information at a nonlocal level. This structure is analogous to the local (e. g., equalizer) and nonlocal (e. g., error correction decoding) elements common in digital communications. Drawing from the analogy of turbo processing for digital communications, turbo speech processing iteratively feeds back the output of the language model to be used as prior probabilities for the phonemic model. This analogy is developed here, and the performance of this turbo model is characterized by using an artificial language model. Using turbo processing, the relative error rate improves significantly, especially in high-noise settings.

引用

页码：83 / 91

页数：9

共 50 条

[41] Time frequency representation for speech recognition
Amsalem, Avishay
Shallom, Ilan D.
2006 INTERNATIONAL CONFERENCE ON INFORMATION AND TECHNOLOGY: RESEARCH AND EDUCATION, 2006, : 99 - +
[42] Counterfactually Fair Automatic Speech Recognition
Sari, Leda
Hasegawa-Johnson, Mark
Yoo, Chang D.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3515 - 3525
[43] Automatic emotion recognition by the speech signal
Schuller, B
Lang, M
Rigoll, G
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 367 - 372
[44] Acoustic Model Adaptation for Speech Recognition
Shinoda, Koichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2348 - 2362
[45] A novel duration model for speech recognition
Yuan, Lichi
Wan, Changxuan
PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, AND SYSTEMS, 2006, : 279 - +
[46] On Continuous Speech Recognition of Indian English
Jin, Xin
Zhang, Keliang
Huang, Xian
Miao, Min
2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
[47] SYNTACTIC AND SEMANTIC POSTPROCESSING FOR SPEECH RECOGNITION
KRALLMANN, H
MARZI, R
DECISION SUPPORT SYSTEMS, 1991, 7 (03) : 253 - 261
[48] SPEECH RECOGNITION IN THE NOISY CAR ENVIRONMENT
RUEHL, HW
DOBLER, S
WEITH, J
MEYER, P
NOLL, A
HAMER, HH
PIOTROWSKI, H
SPEECH COMMUNICATION, 1991, 10 (01) : 11 - 22
[49] Diacritics Effect on Arabic Speech Recognition
Abed, Sa'ed
Alshayeji, Mohammad
Sultan, Sari
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9043 - 9056
[50] Towards automatic recognition of emotion in speech
Razak, AA
Yusof, MHM
Komiya, R
PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 548 - 551

← 1 2 3 4 5 →