TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引：0

作者：

Kawano, Yasufumi ^{[1
]}

Aoki, Yoshimitsu ^{[1
]}

机构：

[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;

D O I：

10.1109/ACCESS.2024.3418210

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.

引用

页码：88322 / 88331

页数：10

共 50 条

[31] OV-NeRF: Open-Vocabulary Neural Radiance Fields With Vision and Language Foundation Models for 3D Semantic Understanding
Liao, Guibiao
Zhou, Kaichen
Bao, Zhenyu
Liu, Kanglin
Li, Qing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12923 - 12936
[32] Open-Set Tattoo Semantic Segmentation
Brilhador, Anderson
da Silva, Rodrigo Tchalski
Modinez-Junior, Carlos Roberto
Spadafora, Gabriel de Almeida
Lopes, Heitor Silverio
Lazzaretti, Andre Eugenio
IEEE ACCESS, 2024, 12 : 107181 - 107200
[33] Source-Free Open Compound Domain Adaptation in Semantic Segmentation
Zhao, Yuyang
Zhong, Zhun
Luo, Zhiming
Lee, Gim Hee
Sebe, Nicu
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7019 - 7032
[34] Open-Vocabulary Category-Level Object Pose and Size Estimation
Cai, Junhao
He, Yisheng
Yuan, Weihao
Zhu, Siyu
Dong, Zilong
Bo, Liefeng
Chen, Qifeng
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7661 - 7668
[35] OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields With Fine-Grained Understanding
Deng, Yinan
Wang, Jiahui
Zhao, Jingyu
Dou, Jianyu
Yang, Yi
Yue, Yufeng
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 652 - 659
[36] Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
Sacchi, Niccolo
Nanchen, Alexandre
Jaggi, Martin
Cernak, Milos
INTERSPEECH 2019, 2019, : 3362 - 3366
[37] SegLD: Achieving universal, zero-shot and open-vocabulary segmentation through multimodal fusion via latent diffusion processes
Zheng, Hongtao
Ding, Yifei
Wang, Zilong
Huang, Xinyan
INFORMATION FUSION, 2024, 111
[38] Open-Vocabulary Object Detection via Scene Graph Discovery
Shi, Hengcan
Hayat, Munawar
Cai, Jianfei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4012 - 4021
[39] DeTAL: Open-Vocabulary Temporal Action Localization With Decoupled Networks
Li, Zhiheng
Zhong, Yujie
Song, Ran
Li, Tianjiao
Ma, Lin
Zhang, Wei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7728 - 7741
[40] OV-VG: A benchmark for open-vocabulary visual grounding
Wang, Chunlei
Feng, Wenquan
Li, Xiangtai
Cheng, Guangliang
Lyu, Shuchang
Liu, Binghao
Chen, Lijiang
Zhao, Qi
NEUROCOMPUTING, 2024, 591

← 1 2 3 4 5 →