Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model

被引:0
|
作者
Kamper, Herman [1 ,2 ]
Jansen, Aren [3 ,4 ]
Goldwater, Sharon [2 ]
机构
[1] Univ Edinburgh, CSTR, Sch Informat, Edinburgh EH8 9YL, Midlothian, Scotland
[2] Univ Edinburgh, ILCC, Sch Informat, Edinburgh EH8 9YL, Midlothian, Scotland
[3] Johns Hopkins Univ, HLTCOE, Baltimore, MD 21218 USA
[4] Johns Hopkins Univ, CLSP, Baltimore, MD 21218 USA
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
unsupervised speech processing; word discovery; speech segmentation; unsupervised learning; segmental models;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current supervised speech technology relies heavily on transcribed speech and pronunciation dictionaries. In settings where unlabelled speech data alone is available, unsupervised methods are required to discover categorical linguistic structure directly from the audio. We present a novel Bayesian model which segments unlabelled input speech into word-like units, resulting in a complete unsupervised transcription of the speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional space; the model (implemented as a Gibbs sampler) then builds a whole-word acoustic model in this space while jointly doing segmentation. We report word error rates in a connected digit recognition task by mapping the unsupervised output to ground truth transcriptions. Our model outperforms a previously developed HMM-based system, even when the model is not constrained to discover only the 11 word types present in the data.
引用
收藏
页码:678 / 682
页数:5
相关论文
共 8 条
  • [1] A segmental framework for fully-unsupervised large-vocabulary speech recognition
    Kamper, Herman
    Jansen, Aren
    Goldwater, Sharon
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 154 - 174
  • [2] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
    Beck, Eugen
    Hannemann, Mirko
    Doetsch, Patrick
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
  • [3] AN EMBEDDED SEGMENTAL K-MEANS MODEL FOR UNSUPERVISED SEGMENTATION AND CLUSTERING OF SPEECH
    Kamper, Herman
    Livescu, Karen
    Goldwater, Sharon
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 719 - 726
  • [4] JS']JSUM: A Multitask Learning Speech Recognition Model for Jointly Supervised and Unsupervised Learning
    Yolwas, Nurmemet
    Meng, Weijing
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [5] Analysis of the characteristics of English part of speech based on unsupervised machine learning and image recognition model
    Li, Pengpeng
    Jiang, Shuai
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 1891 - 1901
  • [6] Fully Unsupervised Word Learning from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events
    Rasanen, Okko Johannes
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2922 - 2925
  • [7] Unsupervised Cross-Corpus Speech Emotion Recognition Using a Multi-Source Cycle-GAN
    Su, Bo-Hao
    Lee, Chi-Chun
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 1991 - 2004
  • [8] Unsupervised machine learning and image recognition model application in English part-of-speech feature learning under the open platform environment
    Yang, Liu
    SOFT COMPUTING, 2023, 27 (14) : 10013 - 10023