Automatic self-supervised learning of associations between speech and text

被引：0

作者：

Knuuttila, Juho ^{[1
]}

Rasanen, Okko ^{[1
]}

Laine, Unto K. ^{[1
]}

机构：

[1] Aalto Univ, Sch Elect Engn, Dept Signal Proc & Acoust, Espoo, Finland

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

statistical learning; associative learning; multi-modal processing; unsupervised learning; self-supervised learning;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Discovery of statistically significant patterns from data and learning of associative links between qualitatively different data streams is becoming increasingly important in dealing with the so-called Big Data problem of the modern society. In this work, a methodological framework for automatic discovery of statistical associations between a high bit-rate and noisy sensory signal (speech) and temporally discrete categorical data with different temporal granularity (text) is presented. The proposed approach does not utilize any phonetic or linguistic knowledge in the analysis, but simply learns the meaningful units of text and speech and their mutual mappings in an unsupervised manner. The first experiments with a limited vocabulary of child directed speech show that, after a period of learning, the method is successful in the generation of a textual representation of continuous speech.

引用

页码：465 / 469

页数：5

共 50 条

[1] INJECTING TEXT IN SELF-SUPERVISED SPEECH PRETRAINING
Chen, Zhehuai
Zhang, Yu
Rosenberg, Andrew
Ramabhadran, Bhuvana
Wang, Gary
Moreno, Pedro
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 251 - 258
[2] A STUDY ON THE IMPACT OF SELF-SUPERVISED LEARNING ON AUTOMATIC DYSARTHRIC SPEECH ASSESSMENT
Cadet, Xavier F.
Aloufi, Ranya
Ahmadi-Abhari, Sara
Haddadi, Hamed
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 630 - 634
[3] Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
Kim, Eesung
Jeon, Jae-Jin
Seo, Hyeji
Kim, Hoon
INTERSPEECH 2022, 2022, : 1411 - 1415
[4] Consistency self-supervised learning method for robust automatic speech recognition
Gao, Changfeng
Cheng, Gaofeng
Zhang, Pengyuan
Shengxue Xuebao/Acta Acustica, 2023, 48 (03): : 578 - 587
[5] Self-Supervised Speech Representation Learning: A Review
Mohamed, Abdelrahman
Lee, Hung-yi
Borgholt, Lasse
Havtorn, Jakob D.
Edin, Joakim
Igel, Christian
Kirchhoff, Katrin
Li, Shang-Wen
Livescu, Karen
Maaloe, Lars
Sainath, Tara N.
Watanabe, Shinji
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1179 - 1210
[6] OTF: Optimal Transport based Fusion of Supervised and Self-Supervised Learning Models for Automatic Speech Recognition
Fu, Li
Li, Siqi
Li, Qingtao
Li, Fangzhu
Deng, Liping
Fan, Lu
Chen, Meng
Wu, Youzheng
He, Xiaodong
INTERSPEECH 2023, 2023, : 934 - 938
[7] LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
Chen, Zhengyang
Chen, Sanyuan
Wu, Yu
Qian, Yao
Wang, Chengyi
Liu, Shujie
Qian, Yanmin
Zeng, Michael
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6147 - 6151
[8] Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
Zaiem, Salah
Parcollet, Titouan
Essid, Slim
INTERSPEECH 2022, 2022, : 669 - 673
[9] Text-to-image synthesis with self-supervised learning
Tan, Yong Xuan
Lee, Chin Poo
Neo, Mai
Lim, Kian Ming
PATTERN RECOGNITION LETTERS, 2022, 157 : 119 - 126
[10] Domain Adaptive Self-supervised Training of Automatic Speech Recognition
Do, Cong-Thanh
Doddipatla, Rama
Li, Mohan
Hain, Thomas
INTERSPEECH 2023, 2023, : 4389 - 4393

← 1 2 3 4 5 →