CDPAM: CONTRASTIVE LEARNING FOR PERCEPTUAL AUDIO SIMILARITY

被引：17

作者：

Mancha, Pranay ^{[1
]}

Fin, Zeyu ^{[2
]}

Zhang, Richard ^{[2
]}

Finkelstein, Adam ^{[1
]}

机构：

[1] Princeton Univ, Princeton, NJ 08544 USA

[2] Adobe Res, San Jose, CA USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

perceptual similarity; audio quality; deep metric; speech enhancement; speech synthesis;

D O I：

10.1109/ICASSP39728.2021.9413711

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. [1] learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbations on which it was trained. This paper introduces CDPAM - a metric that builds on and advances DPAM. The primary improvement is to combine contrastive learning and multi-dimensional representations to build robust models from limited data. In addition, we collect human judgments on triplet comparisons to improve generalization to a broader range of audio perturbations. CDPAM correlates well with human responses across nine varied datasets. We also show that adding this metric to existing speech synthesis and enhancement methods yields significant improvement, as measured by objective and subjective tests.

引用

页码：196 / 200

页数：5

共 50 条

[1] Audio Retrieval Based on Perceptual Similarity
Zhang, Teng
Wu, Ji
Wang, Dingding
Li, Tao
2014 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2014, : 342 - 348
[2] Asymmetric Contrastive Learning for Audio Fingerprinting
Wu, Xinyu
Wang, Hongxia
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1873 - 1877
[3] Contrastive Similarity Matching for Supervised Learning
Qin, Shanshan
Mudur, Nayantara
Pehlevan, Cengiz
NEURAL COMPUTATION, 2021, 33 (05) : 1300 - 1328
[4] Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
Denize, Julien
Rabarisoa, Jaonary
Orcesi, Astrid
Herault, Romain
Canu, Stephane
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2705 - 2715
[5] Similarity Preserving Adversarial Graph Contrastive Learning
In, Yeonjun
Yoon, Kanghoon
Park, Chanyoung
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 867 - 878
[6] Neural Graph Similarity Computation with Contrastive Learning
Hu, Shengze
Zeng, Weixin
Zhang, Pengfei
Tang, Jiuyang
APPLIED SCIENCES-BASEL, 2022, 12 (15):
[7] Efficient Trajectory Similarity Computation with Contrastive Learning
Deng, Liwei
Zhao, Yan
Fu, Zidan
Sun, Hao
Liu, Shuncheng
Zheng, Kai
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 365 - 374
[8] CONTRASTIVE LEARNING OF GENERAL-PURPOSE AUDIO REPRESENTATIONS
Saeed, Aaqib
Grangier, David
Zeghidour, Neil
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3875 - 3879
[9] Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen, Chen
Hou, Nana
Hu, Yuchen
Zou, Heqing
Qi, Xiaofeng
Chng, Eng Siong
INTERSPEECH 2022, 2022, : 2773 - 2777
[10] Perceptual audio watermarking by learning in wavelet domain
Gunsel, Bilge
Kirbiz, Serap
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 383 - +

← 1 2 3 4 5 →