Voice Privacy Using Time-Scale and Pitch Modification

被引:0
|
作者
Singh D.K. [1 ]
Prajapati G.P. [1 ]
Patil H.A. [1 ]
机构
[1] Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar
关键词
Anonymization; Data augmentation; Speech perturbation; Voice privacy;
D O I
10.1007/s42979-023-02549-8
中图分类号
学科分类号
摘要
There is a growing demand toward digitization of various day-to-day work and hence, there is a surge in use of Intelligent Personal Assistants. The extensive use of these smart digital assistants asks for security and privacy preservation techniques because they use personally identifiable characteristics of the user. To that effect, various privacy preservation techniques for different types of voice assistants have been explored. Hence, for voice-based digital assistants, we need a privacy preservation technique. Thus, in this study, we explored the prosody modification methods to modify speaker-specific characteristics of the user, so that the modified utterances can then be made publicly available to use for training of different speech-based systems. This study presents three data augmentation techniques as voice anonymization methods to modify the speaker-dependent speech parameters (i.e., F). The voice anonymization and speech intelligibility are measured objectively using the automatic speaker verification (ASV) and automatic speech recognition (ASR) experiments, respectively, on development and test set of Librispeech dataset. For speed perturbation-based anonymization, up to 53.7% relative increased % EER is observed for a perturbation factor, α= 0.8 for both male and female speakers. For the same case, the % WER was adequate (less than the baseline system), reflecting the use of speed perturbation method as anonymization algorithm in a voice privacy system. The similar performance is observed for pitch perturbation with perturbation factor, λ= - 300 . However, the tempo perturbation could not found to be useful for speaker anonymization during the experiments with % EER in the order of 5–10 % . © 2024, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [21] Stereo Time-Scale Modification Using Sum and Difference Transformation
    Roberts, Timothy
    Paliwal, Kuldip K.
    2018 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2018,
  • [22] Shape-invariant pitch and time-scale modification of speech by variable order phase interpolation
    Pollard, MP
    Cheetham, BMG
    Goodyear, CC
    Edgington, MD
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 919 - 922
  • [23] Time-scale modification of music using a subband approach based on the bark scale
    Dorran, D
    Lawlor, R
    2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, : 173 - 176
  • [24] Adaptive delay concealment for internet voice applications with packet-based time-scale modification
    Liu, F
    Kim, JW
    Kuo, CCJ
    MULTIMEDIA SYSTEMS AND APPLICATIONS III, 2001, 4209 : 91 - 102
  • [25] Adaptive delay concealment for internet voice applications with packet-based time-scale modification
    Liu, F
    Kim, JW
    Kuo, CCJ
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 1461 - 1464
  • [26] Shape invariant time-scale modification of speech using a harmonic model
    O'Brien, Darragh
    Monaghan, Alex
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 381 - 384
  • [27] Shape invariant time-scale modification of speech using a harmonic model
    O'Brien, D
    Monaghan, A
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 381 - 384
  • [28] A Review of Time-Scale Modification of Music Signals
    Driedger, Jonathan
    Mueller, Meinard
    APPLIED SCIENCES-BASEL, 2016, 6 (02):
  • [29] Time-scale modification of music using a synchronized subband/time-domain approach
    Dorran, D
    Lawlor, R
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: AUDIO AND ELECTROACOUSTICS SIGNAL PROCESSING FOR COMMUNICATIONS, 2004, : 225 - 228
  • [30] Audio time-scale modification using a hybrid time-frequency domain approach
    Dorran, D
    Coyle, E
    Lawlor, R
    2005 WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2005, : 279 - 282