Unsupervised Acoustic-to-Articulatory Inversion with Variable Vocal Tract Anatomy

被引:2
|
作者
Sun, Yifan [1 ]
Huang, Qinlong
Wu, Xihong
机构
[1] Peking Univ, Dept Machine Intelligence, Speech & Hearing Res Ctr, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
acoustic-to-articulatory inversion; vocal tract anatomy; ADAPTATION;
D O I
10.21437/Interspeech.2022-477
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic and articulatory variability across speakers has always limited the generalization performance of acoustic-to-articulatory inversion (AAI) methods. Speaker-independent AAI (SI-AAI) methods generally focus on the transformation of acoustic features, but rarely consider the direct matching in the articulatory space. Unsupervised AAI methods have the potential of better generalization ability but typically use a fixed morphological setting of a physical articulatory synthesizer even for different speakers, which may cause nonnegligible articulatory compensation. In this paper, we propose to jointly estimate articulatory movements and vocal tract anatomy during the inversion of speech. An unsupervised AAI framework is employed, where estimated vocal tract anatomy is used to set the configuration of a physical articulatory synthesizer, which in turn is driven by estimated articulation movements to imitate a given speech. Experiments show that the estimation of vocal tract anatomy can bring both acoustic and articulatory benefits. Acoustically, the reconstruction quality is higher; articulatorily, the estimated articulatory movement trajectories better match the measured ones. Moreover, the estimated anatomy parameters show clear clusterings by speakers, indicating successful decoupling of speaker characteristics and linguistic content.
引用
收藏
页码:4656 / 4660
页数:5
相关论文
共 50 条
  • [41] Better acoustic normalization in subject independent acoustic-to-articulatory inversion: benefit to recognition
    Afshan, Amber
    Ghosh, Prasanta Kumar
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5395 - 5399
  • [42] An episodic memory-based solution for the acoustic-to-articulatory inversion problem
    Demange, Sebastien
    Ouni, Slim
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 133 (05): : 2921 - 2930
  • [43] REPRESENTATION LEARNING USING CONVOLUTION NEURAL NETWORK FOR ACOUSTIC-TO-ARTICULATORY INVERSION
    Illa, Aravind
    Ghosh, Prasanta Kumar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5931 - 5935
  • [44] Temporal Convolution Network Based Joint Optimization of Acoustic-to-Articulatory Inversion
    Sun, Guolun
    Huang, Zhihua
    Wang, Li
    Zhang, Pengyuan
    APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [45] Speaker conditioned acoustic-to-articulatory inversion using x-vectors
    Illa, Aravind
    Ghosh, Prasanta Kumar
    INTERSPEECH 2020, 2020, : 1376 - 1380
  • [46] Analysis of acoustic-to-articulatory speech inversion across different accents and languages
    Sivaraman, Ganesh
    Espy-Wilson, Carol
    Wieling, Martijn
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 974 - 978
  • [47] A Comparative Study of Articulatory Features From Facial Video and Acoustic-To-Articulatory Inversion for Phonetic Discrimination
    Narwekar, Abhishek
    Ghosh, Prasanta Kumar
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,
  • [48] MLLR-PRSW for Kinematic-Independent Acoustic-to-Articulatory Inversion
    Bozorg, Narjes
    Johnson, Michael T.
    2019 IEEE 19TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2019), 2019,
  • [49] On smoothing articulatory trajectories obtained from Gaussian mixture model based acoustic-to-articulatory inversion
    Ghosh, Prasanta K.
    Narayanan, Shrikanth S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (02): : EL258 - EL264
  • [50] Improving the performance of acoustic-to-articulatory inversion by removing the training loss of noncritical portions of articulatory channels dynamically
    Fang, Qiang
    INTERSPEECH 2020, 2020, : 1371 - 1375