Multi-Level Segmentation Data Generation Based on a Scene-Specific Word Tree

被引:0
作者
Kim, Soomin [1 ]
Park, Juyoun [1 ]
机构
[1] Korea Inst Sci & Technol, Seoul 02792, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Image segmentation; Semantics; Visualization; Semantic segmentation; Training; Image recognition; Data models; Segmentation; semantic grouping; language hierarchy; dataset generation; multi-level analysis;
D O I
10.1109/ACCESS.2024.3418515
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We, humans, perceive the scene utilizing pre-learned language categories. Our vocabulary system inherently possesses a hierarchy, aiding humans in understanding scenes at multiple levels. For example, when a person passes by chairs and desks from a distance rather than interacting with them up close, the objects are perceived from a broader perspective and recognized as furniture at a higher category level. In this work, we propose a multi-level semantic segmentation data generation method based on a scene-specific word tree to mimic human multi-level scene recognition. Multi-level semantic segmentation data encompasses diverse levels of grouped segmented areas with different degrees of detail, from the finest level of conventional semantic segmentation to coarser levels. Our scene-specific word trees leverage linguistic hierarchies to group scene components by considering relationships between words present in the scene. Furthermore, in the proposed data generation method, each word tree is constructed within a single image, allowing us to group the objects into user-selected levels, taking into account the relative relationship between objects in that scene. We demonstrate the effectiveness of our data generation method by building a multi-level scene segmentation network and training the model with the generated dataset, which reflects the scene-specific word tree.
引用
收藏
页码:88202 / 88215
页数:14
相关论文
共 43 条
  • [1] Avetisyan A, 2024, Arxiv, DOI arXiv:2403.13064
  • [2] Bell S, 2015, PROC CVPR IEEE, P3479, DOI 10.1109/CVPR.2015.7298970
  • [3] OPENSURFACES: A Richly Annotated Catalog of Surface Appearance
    Bell, Sean
    Upchurch, Paul
    Snavely, Noah
    Bala, Kavita
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2013, 32 (04):
  • [4] Bertinetto L., 2020, P IEEE CVF C COMP VI
  • [5] An Object-Level High-Order Contextual Descriptor Based on Semantic, Spatial, and Scale Cues
    Cao, Xiaochun
    Wei, Xingxing
    Han, Yahong
    Chen, Xiaowu
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) : 1327 - 1339
  • [6] Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts
    Chen, Xianjie
    Mottaghi, Roozbeh
    Liu, Xiaobai
    Fidler, Sanja
    Urtasun, Raquel
    Yuille, Alan
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1979 - 1986
  • [7] Masked-attention Mask Transformer for Universal Image Segmentation
    Cheng, Bowen
    Misra, Ishan
    Schwing, Alexander G.
    Kirillov, Alexander
    Girdhar, Rohit
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1280 - 1289
  • [8] Describing Textures in the Wild
    Cimpoi, Mircea
    Maji, Subhransu
    Kokkinos, Iasonas
    Mohamed, Sammy
    Vedaldi, Andrea
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3606 - 3613
  • [9] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848