Exploring the Sim2Real Gap using Digital Twins

被引:1
作者
Sudhakar, Sruthi [1 ]
Hanzelka, Jon [2 ]
Bobillot, Josh [2 ]
Randhavane, Tanmay [2 ]
Joshi, Neel [2 ]
Vineet, Vibhav [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Microsoft Res, Redmond, WA USA
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
关键词
D O I
10.1109/ICCV51070.2023.01867
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is very time consuming to create datasets for training computer vision models. An emerging alternative is to use synthetic data, but if the synthetic data is not similar enough to the real data, the performance is typically below that of training with real data. Thus using synthetic data still requires a large amount of time, money, and skill as one needs to author the data carefully. In this paper, we seek to understand which aspects of this authoring process are most critical. We present an analysis of which factors of variation between simulated and real data are most important. We capture images of YCB objects to create a novel YCB-Real dataset. We then create a novel synthetic "digital twin" dataset, YCB-Synthetic, which matches the YCB-Real dataset and includes variety of artifacts added to the synthetic data. We study the affects of these artifacts on our dataset and two existing published datasets on two different computer vision tasks: object detection and instance segmentation. We provide an analysis of the cost-benefit trade-offs between artist time for fixing artifacts and trained model accuracy. We plan to release this dataset (images and 3D assets) so they can be further used by the community. Link to dataset(1)
引用
收藏
页码:20361 / 20370
页数:10
相关论文
共 34 条
  • [1] [Anonymous], 2017, NIPS
  • [2] CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
    Bao, Jianmin
    Chen, Dong
    Wen, Fang
    Li, Houqiang
    Hua, Gang
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2764 - 2773
  • [3] Beery S, 2020, IEEE WINT CONF APPL, P852, DOI [10.1109/wacv45572.2020.9093570, 10.1109/WACV45572.2020.9093570]
  • [4] Chang A X, 2015, COMPUTER SCI, V1512, P3
  • [5] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [7] Dosovitskiy A., 2017, P 1 ANN C ROB LEARN, P1, DOI DOI 10.48550/ARXIV.1711.03938
  • [8] Du Zhekai, 2021, IEEE C COMP VIS PATT
  • [9] Virtual Worlds as Proxy for Multi-Object Tracking Analysis
    Gaidon, Adrien
    Wang, Qiao
    Cabon, Yohann
    Vig, Eleonora
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4340 - 4349
  • [10] Ganin Y, 2016, J MACH LEARN RES, V17