ADAPTING SPEECH SEPARATION TO REAL-WORLD MEETINGS USING MIXTURE INVARIANT TRAINING

被引:7
|
作者
Sivaraman, Aswin [1 ,2 ]
Wisdom, Scott [1 ]
Erdogan, Hakan [1 ]
Hershey, John R. [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Indiana Univ, Bloomington, IN 47405 USA
关键词
source separation; unsupervised learning; mixture invariant training; real-world audio processing;
D O I
10.1109/ICASSP43922.2022.9747855
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models because it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlapping reverberant and noisy speech data from the AMI Corpus. The models are tested on real AMI recordings containing overlapping speech, and are evaluated subjectively by human listeners. To objectively evaluate our models, we also devise a synthetic AMI test set. For human evaluations on real recordings, we also propose a modification of the standard MUSHRA protocol to handle imperfect reference signals, which we call MUSHIRA. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest SI-SNR improvement, PESQ scores, and human listening ratings across synthetic and real datasets, outperforming unadapted generalist models trained on orders of magnitude more data. Our results show that unsupervised learning through MixIT enables model adaptation on real-world unlabeled spontaneous speech recordings.
引用
收藏
页码:686 / 690
页数:5
相关论文
共 50 条
  • [41] Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
    Chen, Lianwu
    Yu, Meng
    Qian, Yanmin
    Su, Dan
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 302 - 306
  • [42] For real-world outcomes you need real-world training: participatory capacity building in science communication
    Walker, Graham J.
    JCOM-JOURNAL OF SCIENCE COMMUNICATION, 2022, 21 (02):
  • [43] Maximizing the Value of Real-World Data and Real-World Evidence to Accelerate Healthcare Transformation in China: Summary of External Advisory Committee Meetings
    Sun, Feng
    Bedenkov, Alexander
    Liu, Bi-Cheng
    Yang, Jiefu
    Xu, Jin-fu
    Ji, Linong
    Zhou, Min
    Zhang, Shaosen
    Li, Xinli
    Song, Yuanlin
    Chen, Pingyan
    Moreno, Carmen
    PHARMACEUTICAL MEDICINE, 2024, 38 (03) : 157 - 166
  • [44] Maximizing the Value of Real-World Data and Real-World Evidence to Accelerate Healthcare Transformation in China: Summary of External Advisory Committee Meetings
    Feng Sun
    Alexander Bedenkov
    Bi-Cheng Liu
    Jiefu Yang
    Jin-fu Xu
    Linong Ji
    Min Zhou
    Shaosen Zhang
    Xinli Li
    Yuanlin Song
    Pingyan Chen
    Carmen Moreno
    Pharmaceutical Medicine, 2024, 38 : 157 - 166
  • [45] Using velocity loss for monitoring resistance training effort in a real-world setting
    Gentil, Paulo
    Marques, Vitor A.
    Neto, Josaphat P. P.
    Santos, Anna C. G.
    Steele, James
    Fisher, James
    Paoli, Antonio
    Bottaro, Martim
    APPLIED PHYSIOLOGY NUTRITION AND METABOLISM, 2018, 43 (08) : 833 - 837
  • [46] Frequency Separation for Real-World Super-Resolution
    Fritsche, Manuel
    Gu, Shuhang
    Timofte, Radu
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3599 - 3608
  • [47] Characterizing backhaul traffic in 3G networks using real-world speech
    Falsafi, A
    Bruemmer, K
    Deschennes, JH
    2004 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1-4: BROADBAND WIRELESS - THE TIME IS NOW, 2004, : 1743 - 1747
  • [48] A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
    Chen, Li-Wei
    Watanabe, Shinji
    Rudnicky, Alexander
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12644 - 12652
  • [49] SINGLE CHANNEL SPEECH SEPARATION WITH CONSTRAINED UTTERANCE LEVEL PERMUTATION INVARIANT TRAINING USING GRID LSTM
    Xu, Chenglin
    Rao, Wei
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6 - 10
  • [50] An interprofessional course in bioethics: Training for real-world dilemmas
    Lennon-Dearing, Robin
    Lowry, Lois W.
    Ross, Calvin W.
    Dyer, Allen R.
    JOURNAL OF INTERPROFESSIONAL CARE, 2009, 23 (06) : 574 - 585