SegINR: Segment-Wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

被引:0
作者
Kim, Minchan [1 ,2 ]
Jeong, Myeonghun [1 ,2 ]
Lee, Joun Yeop [3 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
[3] Samsung Res, Seoul 06765, South Korea
关键词
Semantics; Predictive models; Computational modeling; Transducers; Training; Indexes; Regulation; Linguistics; Computational efficiency; Implicit neural representation; sequence alignment; text-to-speech;
D O I
10.1109/LSP.2025.3528858
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present SegINR, a novel approach to neural Text-to-Speech (TTS) that eliminates the need for either an auxiliary duration predictor or autoregressive (AR) sequence modeling for alignment. SegINR simplifies the TTS process by directly converting text sequences into frame-level features. Encoded text embeddings are transformed into segments of frame-level features with length regulation using a conditional implicit neural representation (INR). This method, termed Segment-wise INR (SegINR), captures temporal dynamics within each segment while autonomously defining segment boundaries, resulting in lower computational costs. Integrated into a two-stage TTS framework, SegINR is employed for semantic token prediction. Experiments in zero-shot adaptive TTS scenarios show that SegINR outperforms conventional methods in speech quality with computational efficiency.
引用
收藏
页码:646 / 650
页数:5
相关论文
共 41 条
[1]   ONE TTS ALIGNMENT TO RULE THEM ALL [J].
Badlani, Rohan ;
Lancucki, Adrian ;
Shih, Kevin J. ;
Valle, Rafael ;
Ping, Wei ;
Catanzaro, Bryan .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6092-6096
[2]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3]  
Bauer M, 2023, Arxiv, DOI arXiv:2302.03130
[4]  
Chen Jiayu, 2021, Advances in Neural Information Processing Systems
[5]   Learning Implicit Fields for Generative Shape Modeling [J].
Chen, Zhiqin ;
Zhang, Hao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5932-5941
[6]  
Conneau A, 2020, Arxiv, DOI arXiv:2006.13979
[7]  
Defossez A., 2023, Transactions on Machine Learning Research
[8]  
Donahue J., 2021, P INT C LEARN REPR
[9]  
Du CP, 2025, Arxiv, DOI arXiv:2401.14321
[10]  
Dupont E, 2022, PR MACH LEARN RES