TAG: Guidance-Free Open-Vocabulary Semantic Segmentation

被引：0

作者：

Kawano, Yasufumi ^{[1
]}

Aoki, Yoshimitsu ^{[1
]}

机构：

[1] Keio Univ, Grad Sch Integrated Design Engn, Yokohama, Kanagawa 2238522, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Semantic segmentation; Training; Databases; Annotations; Task analysis; Semantics; Vocabulary; Computer vision; Classification algorithms; open-vocabulary; zero-guidance;

D O I：

10.1109/ACCESS.2024.3418210

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC.

引用

页码：88322 / 88331

页数：10

共 50 条

[41] OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments
Deng, Yinan
Wang, Jiahui
Zhao, Jingyu
Tian, Xinyu
Chen, Guangyan
Yang, Yi
Yue, Yufeng
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (10): : 8402 - 8409
[42] Improving Closed and Open-Vocabulary Attribute Prediction Using Transformers
Khoi Pham
Kafle, Kushal
Lin, Zhe
Ding, Zhihong
Cohen, Scott
Tran, Quan
Shrivastava, Abhinav
COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 201 - 219
[43] DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
Ma, Ji
Dai, Hongming
Mu, Yao
Wu, Pengying
Wang, Hao
Chi, Xiaowei
Fei, Yang
Zhang, Shanghang
Liu, Chang
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7389 - 7396
[44] Open-Vocabulary Affordance Detection in 3D Point Clouds
Toan Nguyen
Minh Nhat Vu
An Vuong
Dzung Nguyen
Thieu Vo
Ngan Le
Anh Nguyen
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5692 - 5698
[45] OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition
Chen, Keyan
Jiang, Xiaolong
Wang, Haochen
Yan, Cilin
Gao, Yan
Tang, Xu
Hu, Yao
Xie, Weidi
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5387 - 5409
[46] Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting
Shin, Hyeon-Kyeong
Han, Hyewon
Kim, Doyeon
Chung, Soo-Whan
Kang, Hong-Goo
INTERSPEECH 2022, 2022, : 1871 - 1875
[47] BGFNet: Semantic Segmentation Network Based on Boundary Guidance
Sun, Xiao
Qian, Yurong
Cao, Ruyi
Tuerxun, Palidan
Hu, Zhehao
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[48] 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Xiao, Zihao
Jing, Longlong
Wu, Shangxuan
Zhu, Alex Zihao
Ji, Jingwei
Jiang, Chiyu Max
Hung, Wei-Chih
Funkhouser, Thomas
Kuo, Weicheng
Angelova, Anelia
Zhou, Yin
Sheng, Shiwei
COMPUTER VISION - ECCV 2024, PT XL, 2025, 15098 : 21 - 38
[49] Optimized Tokenization Process for Open-Vocabulary Code Completion: An Empirical Study
Hussain, Yasir
Huang, Zhiqiu
Zhou, Yu
Khan, Izhar Ahmed
Khan, Nasrullah
Abbas, Muhammad Zahid
27TH INTERNATIONAL CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2023, 2023, : 398 - 405
[50] Can Identifier Splitting Improve Open-Vocabulary Language Model of Code
Shi, Jieke
Yang, Zhou
He, Junda
Xu, Bowen
Lo, David
2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 1134 - 1138

← 1 2 3 4 5 →