Modelling asynchrony in automatic speech recognition using loosely coupled hidden Markov models

被引:0
作者
Nock, HJ [1 ]
Young, SJ [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
关键词
automatic speech recognition; pronunciation modelling; loosely coupled hidden Markov models; variational approximation;
D O I
暂无
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Hidden Markov models (HMMs) have been successful for modelling the dynamics of carefully dictated speech, but their performance degrades severely when used to model conversational speech. Since speech is produced by a system of loosely coupled articulators, stochastic models explicitly representing this parallelism may have advantages for automatic speech recognition (ASR), particularly when trying to model the phonological effects inherent in casual spontaneous speech. This paper presents a preliminary feasibility study of one such model class: loosely coupled HMMs. Exact model estimation and decoding is potentially expensive, so possible approximate algorithms are also discussed. Comparison of one particular loosely coupled model on an isolated word task suggests loosely coupled HMMs merit further investigation. An approximate algorithm giving performance which is almost always statistically indistinguishable from the exact algorithm is also identified, making more extensive research computationally feasible. (C) 2002 Cognitive Science Society, Inc. All rights reserved.
引用
收藏
页码:283 / 301
页数:19
相关论文
共 50 条
[31]   Evaluating Speech Intelligibility for Cochlear Implants Using Automatic Speech Recognition [J].
Zhou, Hengzhi ;
Shi, Mingyue ;
Meng, Qinglin .
2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, :1-5
[32]   Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech [J].
Heeman, Peter A. ;
Lunsford, Rebecca ;
McMillin, Andy ;
Yaruss, J. Scott .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :2651-2655
[33]   Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition [J].
Tan, Tien-Ping ;
Besacier, Laurent ;
Lecouteux, Benjamin .
PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, :42-45
[34]   EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION [J].
Thomas, Bethan ;
Kessler, Samuel ;
Karout, Salah .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7102-7106
[35]   Human beatbox sound recognition using an automatic speech recognition toolkit [J].
Evain, Solene ;
Lecouteux, Benjamin ;
Schwab, Didier ;
Contesse, Adrien ;
Pinchaud, Antoine ;
Bernardoni, Nathalie Henrich .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 67
[36]   Using prosody to improve Mandarin automatic speech recognition [J].
Ni, Chong-Jia ;
Liu, Wen-Ju ;
Xu, Bo .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, :2698-2701
[37]   Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems [J].
de Wet, Febe ;
Kleynhans, Neil ;
van Compernolle, Dirk ;
Sahraeian, Reza .
SOUTH AFRICAN JOURNAL OF SCIENCE, 2017, 113 (1-2) :25-33
[38]   Modelling non-stationary noise with spectral factorisation in automatic speech recognition [J].
Hurmalainen, Antti ;
Gemmeke, Jort F. ;
Virtanen, Tuomas .
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03) :763-779
[39]   Fundamental Frequency of Child-Directed Speech Using Automatic Speech Recognition [J].
VanDam, Mark ;
De Palma, Paul .
2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, :1349-1353
[40]   Hidden Boosted MMI and Hierarchical State Posterior Feature for Automatic Speech Recognition based on Hidden Conditional Neural Fields [J].
Fujii, Yasuhisa ;
Yamamoto, Kazumasa ;
Nakagawa, Seiichi .
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :1008-1011