共 58 条
[21]
Fevotte C., 2005, IRISA Technical Report 1706
[22]
Gabbay A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P3051, DOI 10.1109/ICASSP.2018.8462527
[23]
Semantic Video CNNs through Representation Warping
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:4463-4472
[24]
Self-supervised Moving Vehicle Tracking with Stereo Sound
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:7052-7061
[25]
Gao RH, 2019, Arxiv, DOI arXiv:1904.07750
[26]
2.5D Visual Sound
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:324-333
[27]
Learning to Separate Object Sounds by Watching Unlabeled Video
[J].
COMPUTER VISION - ECCV 2018, PT III,
2018, 11207
:36-54
[28]
Memory-Augmented Dense Predictive Coding for Video Representation Learning
[J].
COMPUTER VISION - ECCV 2020, PT III,
2020, 12348
:312-329
[29]
Video Representation Learning by Dense Predictive Coding
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW),
2019,
:1483-1492
[30]
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
[J].
COMPUTER VISION - ECCV 2018, PT VI,
2018, 11210
:659-677