共 67 条
[41]
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
[J].
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021,
2021,
:4220-4229
[43]
Masked Autoencoders for Point Cloud Self-supervised Learning
[J].
COMPUTER VISION - ECCV 2022, PT II,
2022, 13662
:604-621
[44]
Poria S, 2016, IEEE DATA MINING, P439, DOI [10.1109/ICDM.2016.0055, 10.1109/ICDM.2016.178]
[45]
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:21064-21074
[46]
LSTA: Long Short-Term Attention for Egocentric Action Recognition
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:9946-9955
[47]
Tateno M, 2024, Arxiv, DOI arXiv:2405.01090
[48]
Tong Zhan, 2022, ADV NEURAL INFORM PR
[49]
Tsutsui S., 2021, arXiv
[50]
Vaswani A, 2017, ADV NEUR IN, V30