共 40 条
HIGH-FIDELITY NEURAL PHONETIC POSTERIORGRAMS
被引:1
作者:
Churchwell, Cameron
[1
]
Morrison, Max
[1
]
Pardo, Bryan
[1
]
机构:
[1] Northwestern Univ, Evanston, IL 60208 USA
来源:
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024
|
2024年
关键词:
interpretable;
ppg;
pronunciation;
representation;
D O I:
10.1109/ICASSPW62465.2024.10669905
中图分类号:
O42 [声学];
学科分类号:
070206 ;
082403 ;
摘要:
A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
引用
收藏
页码:823 / 827
页数:5
相关论文
共 40 条