Automatic self-supervised learning of associations between speech and text

被引:0
|
作者
Knuuttila, Juho [1 ]
Rasanen, Okko [1 ]
Laine, Unto K. [1 ]
机构
[1] Aalto Univ, Sch Elect Engn, Dept Signal Proc & Acoust, Espoo, Finland
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
statistical learning; associative learning; multi-modal processing; unsupervised learning; self-supervised learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Discovery of statistically significant patterns from data and learning of associative links between qualitatively different data streams is becoming increasingly important in dealing with the so-called Big Data problem of the modern society. In this work, a methodological framework for automatic discovery of statistical associations between a high bit-rate and noisy sensory signal (speech) and temporally discrete categorical data with different temporal granularity (text) is presented. The proposed approach does not utilize any phonetic or linguistic knowledge in the analysis, but simply learns the meaningful units of text and speech and their mutual mappings in an unsupervised manner. The first experiments with a limited vocabulary of child directed speech show that, after a period of learning, the method is successful in the generation of a textual representation of continuous speech.
引用
收藏
页码:465 / 469
页数:5
相关论文
共 50 条
  • [1] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
    Chen, Zhehuai
    Zhang, Yu
    Rosenberg, Andrew
    Ramabhadran, Bhuvana
    Wang, Gary
    Moreno, Pedro
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258
  • [2] A STUDY ON THE IMPACT OF SELF-SUPERVISED LEARNING ON AUTOMATIC DYSARTHRIC SPEECH ASSESSMENT
    Cadet, Xavier F.
    Aloufi, Ranya
    Ahmadi-Abhari, Sara
    Haddadi, Hamed
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 630 - 634
  • [3] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
    Kim, Eesung
    Jeon, Jae-Jin
    Seo, Hyeji
    Kim, Hoon
    INTERSPEECH 2022, 2022, : 1411 - 1415
  • [4] Consistency self-supervised learning method for robust automatic speech recognition
    Gao, Changfeng
    Cheng, Gaofeng
    Zhang, Pengyuan
    Shengxue Xuebao/Acta Acustica, 2023, 48 (03): : 578 - 587
  • [5] Self-Supervised Speech Representation Learning: A Review
    Mohamed, Abdelrahman
    Lee, Hung-yi
    Borgholt, Lasse
    Havtorn, Jakob D.
    Edin, Joakim
    Igel, Christian
    Kirchhoff, Katrin
    Li, Shang-Wen
    Livescu, Karen
    Maaloe, Lars
    Sainath, Tara N.
    Watanabe, Shinji
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
  • [6] OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
    Fu, Li
    Li, Siqi
    Li, Qingtao
    Li, Fangzhu
    Deng, Liping
    Fan, Lu
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2023, 2023, : 934 - 938
  • [7] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
    Chen, Zhengyang
    Chen, Sanyuan
    Wu, Yu
    Qian, Yao
    Wang, Chengyi
    Liu, Shujie
    Qian, Yanmin
    Zeng, Michael
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151
  • [8] Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
    Zaiem, Salah
    Parcollet, Titouan
    Essid, Slim
    INTERSPEECH 2022, 2022, : 669 - 673
  • [9] Text-to-image synthesis with self-supervised learning
    Tan, Yong Xuan
    Lee, Chin Poo
    Neo, Mai
    Lim, Kian Ming
    PATTERN RECOGNITION LETTERS, 2022, 157 : 119 - 126
  • [10] Domain Adaptive Self-supervised Training of Automatic Speech Recognition
    Do, Cong-Thanh
    Doddipatla, Rama
    Li, Mohan
    Hain, Thomas
    INTERSPEECH 2023, 2023, : 4389 - 4393