共 96 条
[1]
[Anonymous], 2010, P 27 INT C MACH LEAR
[2]
ViViT: A Video Vision Transformer
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6816-6826
[3]
Bellver M, 2020, Arxiv, DOI arXiv:2010.00263
[4]
Boundary Loss for Remote Sensing Imagery Semantic Segmentation
[J].
ADVANCES IN NEURAL NETWORKS - ISNN 2019, PT II,
2019, 11555
:388-401
[5]
End-to-End Referring Video Object Segmentation with Multimodal Transformers
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:4975-4985
[6]
End-to-End Object Detection with Transformers
[J].
COMPUTER VISION - ECCV 2020, PT I,
2020, 12346
:213-229
[7]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4724-4733
[8]
See-Through-Text Grouping for Referring Image Segmentation
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:7453-7462
[9]
Chen LC, 2017, Arxiv, DOI arXiv:1706.05587