ScanRefer: 3D Object Localization in RGB-D Scans Using Natural Language

被引：131

作者：

Chen, Dave Zhenyu ^{[1
]}

Chang, Angel X. ^{[2
]}

Niessner, Matthias ^{[1
]}

机构：

[1] Tech Univ Munich, Munich, Germany

[2] Simon Fraser Univ, Burnaby, BC, Canada

来源：

COMPUTER VISION - ECCV 2020, PT XX | 2020年 / 12365卷

关键词：

D O I：

10.1007/978-3-030-58565-5_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce the task of 3D object localization in RGB-D scans using natural language descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free-form description of a specified target object. To address this task, we propose ScanRefer, learning a fused descriptor from 3D object proposals and encoded sentence embeddings. This fused descriptor correlates language expressions with geometric features, enabling regression of the 3D bounding box of a target object. We also introduce the ScanRefer dataset, containing 51, 583 descriptions of 11, 046 objects from 800 ScanNet [8] scenes. ScanRefer is the first large-scale effort to perform object localization via natural language expression directly in 3D (Code: https://daveredrum.github.io/ScanRefer/).

引用

页码：202 / 221

页数：20

共 73 条

[41]

Narita G, 2019, Arxiv, DOI arXiv:1903.01177

[42]

Nguyen A, 2018, Arxiv, DOI arXiv:1803.06152

[43]

Paszke A, 2016, Arxiv, DOI [arXiv:1606.02147, 10.48550/arXiv.1606.02147, DOI 10.48550/ARXIV.1606.02147]

[44]

Pennington J, 2014, P 2014 C EMP METH NA, DOI [DOI 10.3115/V1/D14-1162, 10.3115/v1/D14-1162, 10.3115/v1/d14-1162]

[45] Conditional Image-Text Embedding Networks [J].

Plummer, Bryan A. ;

Kordas, Paige ;

Kiapour, M. Hadi ;

Zheng, Shuai ;

Piramuthu, Robinson ;

Lazebnik, Svetlana .

COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :258-274

[46] Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models [J].

Plummer, Bryan A. ;

Wang, Liwei ;

Cervantes, Chris M. ;

Caicedo, Juan C. ;

Hockenmaier, Julia ;

Lazebnik, Svetlana .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2641-2649

[47]

Prabhudesai M, 2021, Arxiv, DOI arXiv:1910.01210

[48] Deep Hough Voting for 3D Object Detection in Point Clouds [J].

Qi, Charles R. ;

Litany, Or ;

He, Kaiming ;

Guibas, Leonidas J. .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9276-9285

[49]

Qi Charles Ruizhongtai, 2017, PROC 31 INT C NEURAL

[50] REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments [J].

Qi, Yuankai ;

Wu, Qi ;

Anderson, Peter ;

Wang, Xin ;

Wang, William Yang ;

Shen, Chunhua ;

van den Hengel, Anton .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9979-9988

← 1 2 3 4 5 6 7 8 →