HIGH-FIDELITY NEURAL PHONETIC POSTERIORGRAMS

被引:1
作者
Churchwell, Cameron [1 ]
Morrison, Max [1 ]
Pardo, Bryan [1 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024 | 2024年
关键词
interpretable; ppg; pronunciation; representation;
D O I
10.1109/ICASSPW62465.2024.10669905
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A phonetic posteriorgram (PPG) is a time-varying categorical distribution over acoustic units of speech (e.g., phonemes). PPGs are a popular representation in speech generation due to their ability to disentangle pronunciation features from speaker identity, allowing accurate reconstruction of pronunciation (e.g., voice conversion) and coarse-grained pronunciation editing (e.g., foreign accent conversion). In this paper, we demonstrably improve the quality of PPGs to produce a state-of-the-art interpretable PPG representation. We train an off-the-shelf speech synthesizer using our PPG representation and show that high-quality PPGs yield independent control over pitch and pronunciation. We further demonstrate novel uses of PPGs, such as an acoustic pronunciation distance and fine-grained pronunciation control.
引用
收藏
页码:823 / 827
页数:5
相关论文
共 40 条
[21]   High-Dimensional Neural Network Potentials for Organic Reactions and an Improved Training Algorithm [J].
Gastegger, Michael ;
Marquetand, Philipp .
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2015, 11 (05) :2187-2198
[22]   High-dimensional neural network potentials for metal surfaces: A prototype study for copper [J].
Artrith, Nongnuch ;
Behler, Joerg .
PHYSICAL REVIEW B, 2012, 85 (04)
[23]   The Rapid Extraction of Gist-Early Neural Correlates of High-level Visual Processing [J].
Oppermann, Frank ;
Hassler, Uwe ;
Jescheniak, Joerg D. ;
Gruber, Thomas .
JOURNAL OF COGNITIVE NEUROSCIENCE, 2012, 24 (02) :521-529
[24]   A Comprehensive Review on the Advancement of High-Dimensional Neural Networks in Quaternionic Domain with Relevant Applications [J].
Kumar, Sushil ;
Rastogi, Umang .
ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2023, 30 (06) :3941-3968
[25]   A Neural Mechanism in the Human Orbitofrontal Cortex for Preferring High-Fat Foods Based on Oral Texture [J].
Khorisantono, Putu A. ;
Huang, Fei-Yang ;
Sutcliffe, Michael P. F. ;
Fletcher, Paul C. ;
Farooqi, I. Sadaf ;
Grabenhorst, Fabian .
JOURNAL OF NEUROSCIENCE, 2023, 43 (47) :8000-8017
[26]   Learning Low Dimensional Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval [J].
Zhou, Weixun ;
Newsam, Shawn ;
Li, Congmin ;
Shao, Zhenfeng .
REMOTE SENSING, 2017, 9 (05)
[27]   High-Order Convolutional Neural Network Architecture for Predicting DNA-Protein Binding Sites [J].
Zhang, Qinhu ;
Zhu, Lin ;
Huang, De-Shuang .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (04) :1184-1192
[28]   Construction of high-dimensional neural network potentials using environment-dependent atom pairs [J].
Jose, K. V. Jovan ;
Artrith, Nongnuch ;
Behler, Joerg .
JOURNAL OF CHEMICAL PHYSICS, 2012, 136 (19)
[29]   Human discrimination and modeling of high-frequency complex tones shed light on the neural codes for pitch [J].
Guest, Daniel R. ;
Oxenham, Andrew J. .
PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (03)
[30]   How broadband speech may avoid neural firing rate saturation at high intensities and maintain intelligibility [J].
Bashford, James A., Jr. ;
Warren, Richard M. ;
Lenz, Peter W. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (04) :EL340-EL346