A Lightweight Enhancement Approach for Real-Time Semantic Segmentation by Distilling Rich Knowledge from Pre-Trained Vision-Language Model

被引:0
|
作者
Lin, Chia-Yi [1 ]
Chen, Jun-Cheng [2 ]
Wu, Ja-Ling [1 ,3 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
[3] Natl Taiwan Univ, Grad Inst Networking & Multimedia, Taipei, Taiwan
关键词
CLIP; real-time; semantic segmentation; vision-language pre-training;
D O I
10.1561/116.20240015
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this work, we propose a lightweight approach to enhance realtime semantic segmentation by leveraging the pre-trained vision- language models, specifically utilizing the text encoder of Contrastive Language-Image Pretraining (CLIP) to generate rich semantic embeddings for text labels. Then, our method distills this textual knowledge into the segmentation model, integrating the image and text embeddings to align visual and textual information. Additionally, we implement learnable prompt embeddings for better class-specific semantic comprehension. We propose a two-stage training strategy for efficient learning: the segmentation backbone initially learns from fixed text embeddings and subsequently optimizes prompt embeddings to streamline the learning process. The extensive evaluations and ablation studies validate our approach's ability to effectively improve the semantic segmentation model's performance over the compared methods.
引用
收藏
页数:26
相关论文
共 10 条
  • [1] CLIPose: Category-Level Object Pose Estimation With Pre-Trained Vision-Language Knowledge
    Lin, Xiao
    Zhu, Minghao
    Dang, Ronghao
    Zhou, Guangliang
    Shu, Shaolong
    Lin, Feng
    Liu, Chengju
    Chen, Qijun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9125 - 9138
  • [2] X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks
    Zeng, Yan
    Zhang, Xinsong
    Li, Hang
    Wang, Jiawei
    Zhang, Jipeng
    Zhou, Wangchunshu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3156 - 3168
  • [3] CLIP4STR: A Simple Baseline for Scene Text Recognition With Pre-Trained Vision-Language Model
    Zhao, Shuai
    Quan, Ruijie
    Zhu, Linchao
    Yang, Yi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 6893 - 6904
  • [4] Fine-tuning a pre-trained Convolutional Neural Network Model to translate American Sign Language in Real-time
    Cayamcela, Manuel Eugenio Morocho
    Lim, Wansu
    2019 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS (ICNC), 2019, : 100 - 104
  • [5] Real-time pavement surface crack detection based on lightweight semantic segmentation model
    Yu, Huayang
    Deng, Yihao
    Guo, Feng
    TRANSPORTATION GEOTECHNICS, 2024, 48
  • [6] Improved Real-Time Semantic Segmentation Network Model for Crop Vision Navigation Line Detection
    Cao, Maoyong
    Tang, Fangfang
    Ji, Peng
    Ma, Fengying
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [7] Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
    Cheng, Kanzhi
    Song, Wenpo
    Ma, Zheng
    Zhu, Wenhao
    Zhu, Zixuan
    Zhang, Jianbing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5038 - 5047
  • [8] Improved real-time semantic segmentation network model for crop vision navigation line detection (vol 13, 898131, 2022)
    Cao, Maoyong
    Tang, Fangfang
    Ji, Peng
    Ma, Fengying
    FRONTIERS IN PLANT SCIENCE, 2023, 14
  • [9] Real-time monitoring of weld surface morphology with lightweight semantic segmentation model improved by attention mechanism during laser keyhole welding
    Cai, Wang
    Shu, LeShi
    Geng, ShaoNing
    Zhou, Qi
    Cao, LongChao
    OPTICS AND LASER TECHNOLOGY, 2024, 174
  • [10] Development and evaluation of a deep learning model for real-time ground vehicle semantic segmentation from UAV-based thermal infrared imagery
    Masouleh, Mehdi Khoshboresh
    Shah-Hosseini, Reza
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 155 : 172 - 186