Toward Real-World Multi-View Object Classification: Dataset, Benchmark, and Analysis

被引：0

作者：

Wang, Ren ^{[1
]}

Kim, Tae Sung ^{[2
]}

Kim, Jin-Sung ^{[2
]}

Lee, Hyuk-Jae ^{[1
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Sun Moon Univ, Dept Elect Engn, Asan 31460, South Korea

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

关键词：

Convolution; Benchmark testing; Feature extraction; Circuits and systems; Annotations; Transformers; Neural networks; Multi-view object classification; learning from noisy labels; hidden stratification; dataset; benchmark; 3D; REPRESENTATION; RETRIEVAL; NETWORK; DEEP; RECOGNITION;

D O I：

10.1109/TCSVT.2024.3359681

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Aggregating information from multiple views is essential to accurately identifying similar objects. Nevertheless, existing datasets have limitations that hinder the development of practical multi-view object classification methods for real-world scenarios. The limitations include synthetic and coarse-grained objects in the datasets and the absence of a validation split to enable standard hyperparameter tuning. This paper proposes a new dataset, MVP-N (Multi-View, Retail Products, Label Noise), which contains 16k real captured views and 9k multi-view sets collected from 44 retail products. In MVP-N, each view is annotated with a human-perceived information quantity (HPIQ) for analyzing how views are utilized in information aggregation. Moreover, the fine-grained categorization of objects provides the inter-class view similarity and intra-class view variance, enabling the research on learning from noisy labels of the multi-view images. Finally, a new soft label scheme, HS-HPIQ, is proposed considering the hidden stratification phenomenon in the multi-view images and achieves superior performance. To assess the effectiveness of MVP-N and the proposed HS-HPIQ, this study overviews 50 recent multi-view-based methods regarding their practicality in real-world scenarios. Six feature aggregation methods and twelve soft label methods are benchmarked on MVP-N with a deep analysis. The dataset and code are publicly available at https://github.com/SMNUResearch/MVP-N.

引用

页码：5653 / 5664

页数：12

共 104 条

[1] Arazo E, 2019, PR MACH LEARN RES, V97
[2] Multi-Scale Representation Learning on Hypergraph for 3D Shape Retrieval and Recognition
Bai, Junjie
Gong, Biao
Zhao, Yining
Lei, Fuqiang
Yan, Chenggang
Gao, Yue
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 5327 - 5338
[3] GIFT: Towards Scalable 3D Shape Retrieval
Bai, Song
Bai, Xiang
Zhou, Zhichao
Zhang, Zhaoxiang
Tian, Qi
Latecki, Longin Jan
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (06) : 1257 - 1271
[4] GIFT: A Real-time and Scalable 3D Shape Search Engine
Bai, Song
Bai, Xiang
Zhou, Zhichao
Zhang, Zhaoxiang
Latecki, Longin Jan
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5023 - 5032
[5] The devil is in the details: an evaluation of recent feature encoding methods
Chatfield, Ken
Lempitsky, Victor
Vedaldi, Andrea
Zisserman, Andrew
[J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
[6] Chen PF, 2021, AAAI CONF ARTIF INTE, V35, P11442
[7] VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification
Chen, Songle
Zheng, Lintao
Zhang, Yan
Sun, Zhixin
Xu, Kai
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (12) : 3244 - 3257
[8] Chen T. Yu, 2021, BRIT MACH VIS C BMVC, P1
[9] A Benchmark for 3D Mesh Segmentation
Chen, Xiaobai
Golovinskiy, Aleksey
Funkhouser, Thomas
[J]. ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03):
[10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 7 8 9 10 →