Toward Real-World Multi-View Object Classification: Dataset, Benchmark, and Analysis

被引:0
作者
Wang, Ren [1 ]
Kim, Tae Sung [2 ]
Kim, Jin-Sung [2 ]
Lee, Hyuk-Jae [1 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Sun Moon Univ, Dept Elect Engn, Asan 31460, South Korea
关键词
Convolution; Benchmark testing; Feature extraction; Circuits and systems; Annotations; Transformers; Neural networks; Multi-view object classification; learning from noisy labels; hidden stratification; dataset; benchmark; 3D; REPRESENTATION; RETRIEVAL; NETWORK; DEEP; RECOGNITION;
D O I
10.1109/TCSVT.2024.3359681
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Aggregating information from multiple views is essential to accurately identifying similar objects. Nevertheless, existing datasets have limitations that hinder the development of practical multi-view object classification methods for real-world scenarios. The limitations include synthetic and coarse-grained objects in the datasets and the absence of a validation split to enable standard hyperparameter tuning. This paper proposes a new dataset, MVP-N (Multi-View, Retail Products, Label Noise), which contains 16k real captured views and 9k multi-view sets collected from 44 retail products. In MVP-N, each view is annotated with a human-perceived information quantity (HPIQ) for analyzing how views are utilized in information aggregation. Moreover, the fine-grained categorization of objects provides the inter-class view similarity and intra-class view variance, enabling the research on learning from noisy labels of the multi-view images. Finally, a new soft label scheme, HS-HPIQ, is proposed considering the hidden stratification phenomenon in the multi-view images and achieves superior performance. To assess the effectiveness of MVP-N and the proposed HS-HPIQ, this study overviews 50 recent multi-view-based methods regarding their practicality in real-world scenarios. Six feature aggregation methods and twelve soft label methods are benchmarked on MVP-N with a deep analysis. The dataset and code are publicly available at https://github.com/SMNUResearch/MVP-N.
引用
收藏
页码:5653 / 5664
页数:12
相关论文
共 104 条
  • [1] Arazo E, 2019, PR MACH LEARN RES, V97
  • [2] Multi-Scale Representation Learning on Hypergraph for 3D Shape Retrieval and Recognition
    Bai, Junjie
    Gong, Biao
    Zhao, Yining
    Lei, Fuqiang
    Yan, Chenggang
    Gao, Yue
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 (30) : 5327 - 5338
  • [3] GIFT: Towards Scalable 3D Shape Retrieval
    Bai, Song
    Bai, Xiang
    Zhou, Zhichao
    Zhang, Zhaoxiang
    Tian, Qi
    Latecki, Longin Jan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (06) : 1257 - 1271
  • [4] GIFT: A Real-time and Scalable 3D Shape Search Engine
    Bai, Song
    Bai, Xiang
    Zhou, Zhichao
    Zhang, Zhaoxiang
    Latecki, Longin Jan
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5023 - 5032
  • [5] The devil is in the details: an evaluation of recent feature encoding methods
    Chatfield, Ken
    Lempitsky, Victor
    Vedaldi, Andrea
    Zisserman, Andrew
    [J]. PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,
  • [6] Chen PF, 2021, AAAI CONF ARTIF INTE, V35, P11442
  • [7] VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification
    Chen, Songle
    Zheng, Lintao
    Zhang, Yan
    Sun, Zhixin
    Xu, Kai
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2019, 25 (12) : 3244 - 3257
  • [8] Chen T. Yu, 2021, BRIT MACH VIS C BMVC, P1
  • [9] A Benchmark for 3D Mesh Segmentation
    Chen, Xiaobai
    Golovinskiy, Aleksey
    Funkhouser, Thomas
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03):
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848