Automatic source speaker selection for voice conversion

被引:0
|
作者
Turk, Oytun [1 ]
Arslan, Levent M. [2 ]
机构
[1] Bogazici Univ, Dept Elect & Elect Engn, TR-34342 Istanbul, Turkey
[2] Sestek Inc, R&D Dept, TR-34342 Istanbul, Turkey
来源
关键词
hearing; learning (artificial intelligence); neural nets; regression analysis; speaker recognition; speech coding; PROCESSING TECHNIQUES; TRANSFORMATION; QUALITY;
D O I
10.1121/1.3027445
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.
引用
收藏
页码:480 / 491
页数:12
相关论文
共 50 条
  • [1] Automatic speaker recognition as a measurement of voice imitation and conversion
    Farrus, Mireia
    Wagner, Michael
    Erro, Daniel
    Hernando, Javier
    INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2010, 17 (01) : 119 - 142
  • [2] Target speaker filtration by mask estimation for source speaker traceability in voice conversion
    Zhang, Junfei
    Zhang, Xiongwei
    Sun, Meng
    Zou, Xia
    Jia, Chong
    Li, Yihao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [3] Comprehensive source-target speaker voice conversion analysis
    1600, UK Simulation Society, Clifton Lane, Nottingham, NG11 8NS, United Kingdom (15):
  • [4] SPOOFING COUNTERMEASURES TO PROTECT AUTOMATIC SPEAKER VERIFICATION FROM VOICE CONVERSION
    Alegre, Federico
    Amehraye, Asmaa
    Evans, Nicholas
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3068 - 3072
  • [5] Training data selection for voice conversion using speaker selection and vector field smoothing
    Hashimoto, M
    Higuchi, N
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1397 - 1400
  • [6] Voice Disguise in Automatic Speaker Recognition
    Farrus, Mireia
    ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [7] Voice disguise and automatic speaker recognition
    Zhang, Cuiling
    Tan, Tiejun
    FORENSIC SCIENCE INTERNATIONAL, 2008, 175 (2-3) : 118 - 122
  • [8] Transformation of speaker characteristics for voice conversion
    Rentzos, D
    Vaseghi, S
    Turajlic, E
    Yan, Q
    Ho, CH
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 706 - 711
  • [9] Speaker Anonymity and Voice Conversion Vulnerability: A Speaker Recognition Analysis
    Saini, Shalini
    Saxena, Nitesh
    2023 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY, CNS, 2023,
  • [10] When automatic voice disguise meets automatic speaker verification
    Zheng, Linlin
    Li, Jiakang
    Sun, Meng
    Zhang, Xiongwei
    Zheng, Thomas Fang
    IEEE Transactions on Information Forensics and Security, 2021, 16 : 824 - 837