ADAPTING SPEECH SEPARATION TO REAL-WORLD MEETINGS USING MIXTURE INVARIANT TRAINING

被引:7
|
作者
Sivaraman, Aswin [1 ,2 ]
Wisdom, Scott [1 ]
Erdogan, Hakan [1 ]
Hershey, John R. [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
[2] Indiana Univ, Bloomington, IN 47405 USA
关键词
source separation; unsupervised learning; mixture invariant training; real-world audio processing;
D O I
10.1109/ICASSP43922.2022.9747855
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models because it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlapping reverberant and noisy speech data from the AMI Corpus. The models are tested on real AMI recordings containing overlapping speech, and are evaluated subjectively by human listeners. To objectively evaluate our models, we also devise a synthetic AMI test set. For human evaluations on real recordings, we also propose a modification of the standard MUSHRA protocol to handle imperfect reference signals, which we call MUSHIRA. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest SI-SNR improvement, PESQ scores, and human listening ratings across synthetic and real datasets, outperforming unadapted generalist models trained on orders of magnitude more data. Our results show that unsupervised learning through MixIT enables model adaptation on real-world unlabeled spontaneous speech recordings.
引用
收藏
页码:686 / 690
页数:5
相关论文
共 50 条
  • [21] Speech-to-Speech Translation for a Real-world Unwritten Language
    Chen, Peng-Jen
    Tran, Kevin
    Yang, Yilin
    Du, Jingfei
    Kao, Justine
    Chung, Yu-An
    Tomasello, Paden
    Duquenne, Paul-Ambroise
    Schwenk, Holger
    Gong, Hongyu
    Inaguma, Hirofumi
    Popuri, Sravya
    Wang, Changhan
    Pino, Juan
    Hsu, Wei-Ning
    Lee, Ann
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4969 - 4983
  • [22] Soup:: A parser for real-world spontaneous speech
    Gavaldà, M
    NEW DEVELOPMENTS IN PARSING TECHNOLOGY, 2004, : 339 - 350
  • [23] ADAPTING BRANCH-AND-BOUND FOR REAL-WORLD SCHEDULING PROBLEMS
    VASKO, FJ
    WOLF, FE
    STOTT, KL
    WOODYATT, LR
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 1993, 44 (05) : 483 - 490
  • [24] Designing nature reserves: adapting ecology to real-world problems
    Kingsland, S
    ENDEAVOUR, 2002, 26 (01) : 9 - 14
  • [25] The Real-World Foundation of Adapting Clinical Guidelines for the Digital Age
    Michaels, Maria
    Jakhmola, Priya
    Lubin, Ira M.
    Fochtmann, Laura J.
    Casey Jr, Donald E.
    Opelka, Frank G.
    Skapik, Julia
    Larsen, Kevin
    Tailor, Amrita
    Matson-Koffman, Dyann
    AMERICAN JOURNAL OF MEDICAL QUALITY, 2024, 39 (02) : 89 - 90
  • [26] Ibis: Real-World Problem Solving using Real-World Grids
    Bal, H. E.
    Drost, N.
    Kemp, R.
    Maassen, J.
    van Nieuwpoort, R. V.
    van Reeuwijk, C.
    Seinstra, F. J.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 1831 - 1838
  • [27] Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
    Liu, Yun
    Zhang, Hui
    Zhang, Xueliang
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1151 - 1155
  • [28] MULTIMODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING D-VECTORS WITH SPATIAL FEATURES
    Kang, Wonjune
    Roy, Brandon C.
    Chow, Wesley
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6509 - 6513
  • [29] Single-channel speech separation using soft-minimum permutation invariant training
    Yousefi, Midia
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2023, 151 : 76 - 85
  • [30] Penn State Delivers Real-World Training
    Hale, James
    DOWN BEAT, 2018, 85 (06): : 134 - 134