MVImgNet: A Large-scale Dataset of Multi-view Images

被引：31

作者：

Yu, Xianggang ^{[1
,2
]}

Xu, Mutian ^{[1
,2
]}

Zhang, Yidan ^{[1
,2
]}

Liu, Haolin ^{[1
,2
]}

Ye, Chongjie ^{[1
,2
]}

Wu, Yushuang ^{[1
,2
]}

Yan, Zizheng ^{[1
,2
]}

Zhu, Chenming ^{[1
,2
]}

Xiong, Zhangyang ^{[1
,2
]}

Liang, Tianyou ^{[1
,2
]}

Chen, Guanying ^{[1
,2
]}

Cui, Shuguang ^{[1
,2
]}

Han, Xiaoguang ^{[1
,2
]}

机构：

[1] CUHKSZ, FNii, Shenzhen, Peoples R China

[2] CUHKSZ, SSE, Shenzhen, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/CVPR52729.2023.00883

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Being data-driven is one of the most iconic properties of deep learning algorithms. The birth of ImageNet [24] drives a remarkable trend of 'learning from large-scale data' in computer vision. Pretraining on ImageNet to obtain rich universal representations has been manifested to benefit various 2D visual tasks, and becomes a standard in 2D vision. However, due to the laborious collection of real-world 3D data, there is yet no generic dataset serving as a counterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3D community is unraveled. To remedy this defect, we introduce MVImgNet, a large-scale dataset of multi-view images, which is highly convenient to gain by shooting videos of real-world objects in human daily life. It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds. The multi-view attribute endows our dataset with 3D-aware signals, making it a soft bridge between 2D and 3D vision. We conduct pilot studies for probing the potential of MVImgNet on a variety of 3D and 2D visual tasks, including radiance field reconstruction, multi-view stereo, and view-consistent image understanding, where MVImgNet demonstrates promising performance, remaining lots of possibilities for future explorations. Besides, via dense reconstruction on MVImgNet, a 3D object point cloud dataset is derived, called MVPNet, covering 87,200 samples from 150 categories, with the class label on each point cloud. Experiments show that MVPNet can benefit the real-world 3D object classification while posing new challenges to point cloud understanding. MVImgNet and MVPNet will be public, hoping to inspire the broader vision community.

引用

页码：9150 / 9161

页数：12

共 50 条

[31] A DAISY descriptor based multi-view stereo method for large-scale scenes
Xue, Bindang
Cao, Lei
Han, Donghai
Bai, Xiangzhi
Zhou, Fugen
Jiang, Zhiguo
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 35 : 15 - 24
[32] Large Scale Multi-view Stereopsis Evaluation
Jensen, Rasmus
Dahl, Anders
Vogiatzis, George
Tola, Engin
Aanaes, Henrik
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 406 - 413
[33] Anchor Pseudo-Supervise Large-Scale Incomplete Multi-View Clustering
Zhu, Songbai
Dai, Jian
Yang, Guolai
Ren, Zhenwen
IEEE ACCESS, 2023, 11 : 107812 - 107822
[34] A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-view Stereo Reconstruction from An Open Aerial Dataset
Liu, Jin
Ji, Shunping
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6049 - 6058
[35] Joint Multi-View Hashing for Large-Scale Near-Duplicate Video Retrieval
Nie, Xiushan
Jing, Weizhen
Cui, Chaoran
Zhang, Chen Jason
Zhu, Lei
Yin, Yilong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (10) : 1951 - 1965
[36] Tensor-Derived Large-Scale Multi-View Subspace Clustering With Faithful Semantics
Huang, Sujia
Du, Shide
Fu, Lele
Wu, Zhihao
Wang, Shiping
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2024, 10 : 584 - 598
[37] Edge aware depth inference for large-scale aerial building multi-view stereo
Zhang, Song
Wei, Zhiwei
Xu, Wenjia
Zhang, Lili
Wang, Yang
Zhang, Jinming
Liu, Junyi
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2024, 207 : 27 - 42
[38] Design and evaluation of a large-scale autostereoscopic multi-view laser display for outdoor applications
Reitterer, Joerg
Fidler, Franz
Schmid, Gerhard
Riel, Thomas
Hambeck, Christian
Saint Julien-Wallsee, Ferdinand
Leeb, Walter
Schmid, Ulrich
OPTICS EXPRESS, 2014, 22 (22): : 27063 - 27068
[39] Gait Analysis of Gender and Age Using a Large-Scale Multi-view Gait Database
Makihara, Yasushi
Mannami, Hidetoshi
Yagi, Yasushi
COMPUTER VISION - ACCV 2010, PT II, 2011, 6493 : 440 - 451
[40] Large-scale multi-view subspace clustering via embedding space and partition matrix
Cheng, Tianhang
Peng, Jinjia
Li, Hui
Wang, Huibing
NEUROCOMPUTING, 2024, 602

← 1 2 3 4 5 →