Hear Your Face: Face-based voice conversion with F0 estimation

被引:0
|
作者
Lee, Jaejun [1 ]
Oh, Yoori [1 ]
Hwang, Injune [1 ]
Lee, Kyogu [1 ,2 ,3 ]
机构
[1] Seoul Natl Univ, Dept Intelligence & Informat, Seoul, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea
[3] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
来源
关键词
voice conversion; face/voice association; cross modal generation; speaker embedding; IDENTITY;
D O I
10.21437/Interspeech.2024-232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper delves into the emerging field of face-based voice conversion, leveraging the unique relationship between an individual's facial features and their vocal characteristics. We present a novel face-based voice conversion framework that particularly utilizes the average fundamental frequency of the target speaker, derived solely from their facial images. Through extensive analysis, our framework demonstrates superior speech generation quality and the ability to align facial features with voice characteristics, including tracking of the target speaker's fundamental frequency.
引用
收藏
页码:4378 / 4382
页数:5
相关论文
共 50 条
  • [1] Face-based Voice Conversion: Learning the Voice behind a Face
    Lu, Hsiao-Han
    Weng, Shao-En
    Yen, Ya-Fan
    Shuai, Hong-Han
    Cheng, Wen-Huang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 496 - 505
  • [2] Face-Based Illuminant Estimation
    Bianco, Simone
    Schettini, Raimondo
    COMPUTER VISION - ECCV 2012, PT III, 2012, 7585 : 623 - 626
  • [3] A Novel Filtering-based F0 Estimation Algorithm with an Application to Voice Conversion
    Shah, Nirmesh J.
    Bachhav, Pramod B.
    Patil, Hemant A.
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1579 - 1582
  • [4] VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0
    Yutani, Kaori
    Uto, Yosuke
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3897 - 3900
  • [5] IMPROVED F0 MODELING AND GENERATION IN VOICE CONVERSION
    Kunikoshi, Aki
    Qian, Yao
    Soong, Frank
    Minematsu, Nobuaki
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4568 - 4571
  • [6] F0 Transformation within the Voice Conversion Framework
    Hanzlicek, Zdenek
    Matousek, Jindrich
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 681 - 684
  • [7] FUSION NETWORK FOR FACE-BASED AGE ESTIMATION
    Wang, Haoyi
    Wei, Xingjie
    Sanchez, Victor
    Li, Chang-Tsun
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2675 - 2679
  • [8] Can I Hear Your Face? Pervasive Attack on Voice Authentication Systems with a Single Face Image
    Jiang, Nan
    Sun, Bangjie
    Sim, Terence
    Han, Jun
    PROCEEDINGS OF THE 33RD USENIX SECURITY SYMPOSIUM, SECURITY 2024, 2024, : 1045 - 1062
  • [9] HMM-Based Voice Conversion Using Quantized F0 Context
    Nose, Takashi
    Ota, Yuhei
    Kobayashi, Takao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2483 - 2490
  • [10] DeepAge: Deep Learning of face-based age estimation
    Sendik, Omry
    Keller, Yosi
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2019, 78 : 368 - 375