CDPAM: CONTRASTIVE LEARNING FOR PERCEPTUAL AUDIO SIMILARITY

被引:17
|
作者
Mancha, Pranay [1 ]
Fin, Zeyu [2 ]
Zhang, Richard [2 ]
Finkelstein, Adam [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Adobe Res, San Jose, CA USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
perceptual similarity; audio quality; deep metric; speech enhancement; speech synthesis;
D O I
10.1109/ICASSP39728.2021.9413711
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. [1] learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM - a metric that builds on and advances DPAM. The primary improvement is to combine contrastive learning and multi-dimensional representations to build robust models from limited data. In addition, we collect human judgments on triplet comparisons to improve generalization to a broader range of audio perturbations. CDPAM correlates well with human responses across nine varied datasets. We also show that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.
引用
收藏
页码:196 / 200
页数:5
相关论文
共 50 条
  • [1] Audio Retrieval Based on Perceptual Similarity
    Zhang, Teng
    Wu, Ji
    Wang, Dingding
    Li, Tao
    2014 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2014, : 342 - 348
  • [2] Asymmetric Contrastive Learning for Audio Fingerprinting
    Wu, Xinyu
    Wang, Hongxia
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1873 - 1877
  • [3] Contrastive Similarity Matching for Supervised Learning
    Qin, Shanshan
    Mudur, Nayantara
    Pehlevan, Cengiz
    NEURAL COMPUTATION, 2021, 33 (05) : 1300 - 1328
  • [4] Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
    Denize, Julien
    Rabarisoa, Jaonary
    Orcesi, Astrid
    Herault, Romain
    Canu, Stephane
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2705 - 2715
  • [5] Similarity Preserving Adversarial Graph Contrastive Learning
    In, Yeonjun
    Yoon, Kanghoon
    Park, Chanyoung
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 867 - 878
  • [6] Neural Graph Similarity Computation with Contrastive Learning
    Hu, Shengze
    Zeng, Weixin
    Zhang, Pengfei
    Tang, Jiuyang
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [7] Efficient Trajectory Similarity Computation with Contrastive Learning
    Deng, Liwei
    Zhao, Yan
    Fu, Zidan
    Sun, Hao
    Liu, Shuncheng
    Zheng, Kai
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 365 - 374
  • [8] CONTRASTIVE LEARNING OF GENERAL-PURPOSE AUDIO REPRESENTATIONS
    Saeed, Aaqib
    Grangier, David
    Zeghidour, Neil
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3875 - 3879
  • [9] Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
    Chen, Chen
    Hou, Nana
    Hu, Yuchen
    Zou, Heqing
    Qi, Xiaofeng
    Chng, Eng Siong
    INTERSPEECH 2022, 2022, : 2773 - 2777
  • [10] Perceptual audio watermarking by learning in wavelet domain
    Gunsel, Bilge
    Kirbiz, Serap
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 383 - +