Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model

被引：0

作者：

Kamper, Herman ^{[1
,2
]}

Jansen, Aren ^{[3
,4
]}

Goldwater, Sharon ^{[2
]}

机构：

[1] Univ Edinburgh, CSTR, Sch Informat, Edinburgh EH8 9YL, Midlothian, Scotland

[2] Univ Edinburgh, ILCC, Sch Informat, Edinburgh EH8 9YL, Midlothian, Scotland

[3] Johns Hopkins Univ, HLTCOE, Baltimore, MD 21218 USA

[4] Johns Hopkins Univ, CLSP, Baltimore, MD 21218 USA

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

unsupervised speech processing; word discovery; speech segmentation; unsupervised learning; segmental models;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current supervised speech technology relies heavily on transcribed speech and pronunciation dictionaries. In settings where unlabelled speech data alone is available, unsupervised methods are required to discover categorical linguistic structure directly from the audio. We present a novel Bayesian model which segments unlabelled input speech into word-like units, resulting in a complete unsupervised transcription of the speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional space; the model (implemented as a Gibbs sampler) then builds a whole-word acoustic model in this space while jointly doing segmentation. We report word error rates in a connected digit recognition task by mapping the unsupervised output to ground truth transcriptions. Our model outperforms a previously developed HMM-based system, even when the model is not constrained to discover only the 11 word types present in the data.

引用

页码：678 / 682

页数：5

共 8 条

[1] A segmental framework for fully-unsupervised large-vocabulary speech recognition
Kamper, Herman
Jansen, Aren
Goldwater, Sharon
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 154 - 174
[2] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Beck, Eugen
Hannemann, Mirko
Doetsch, Patrick
Schlueter, Ralf
Ney, Hermann
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
[3] AN EMBEDDED SEGMENTAL K-MEANS MODEL FOR UNSUPERVISED SEGMENTATION AND CLUSTERING OF SPEECH
Kamper, Herman
Livescu, Karen
Goldwater, Sharon
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 719 - 726
[4] JS']JSUM: A Multitask Learning Speech Recognition Model for Jointly Supervised and Unsupervised Learning
Yolwas, Nurmemet
Meng, Weijing
APPLIED SCIENCES-BASEL, 2023, 13 (09):
[5] Analysis of the characteristics of English part of speech based on unsupervised machine learning and image recognition model
Li, Pengpeng
Jiang, Shuai
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 1891 - 1901
[6] Fully Unsupervised Word Learning from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events
Rasanen, Okko Johannes
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2922 - 2925
[7] Unsupervised Cross-Corpus Speech Emotion Recognition Using a Multi-Source Cycle-GAN
Su, Bo-Hao
Lee, Chi-Chun
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1991 - 2004
[8] Unsupervised machine learning and image recognition model application in English part-of-speech feature learning under the open platform environment
Yang, Liu
SOFT COMPUTING, 2023, 27 (14) : 10013 - 10023

← 1 →