共 22 条
[1]
[Anonymous], 1993, TIMIT ACOUSTIC PHONE
[2]
Less Is More: Picking Informative Frames for Video Captioning
[J].
COMPUTER VISION - ECCV 2018, PT XIII,
2018, 11217
:367-384
[3]
Cho KYHY, 2014, Arxiv, DOI arXiv:1409.1259
[4]
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[5]
Liu LJ, 2018, INTERSPEECH, P1983
[6]
Liu SX, 2018, INTERSPEECH, P496
[7]
Lorenzo-Trueba J., 2018, P SPEAK LANG REC WOR
[8]
Lu H., 2019, 2019 IEEE INT C ACOU
[9]
Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion
[J].
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES,
2018,
:2833-2837