Learning Geometric Information via Transformer Network for Key-Points Based Motion Segmentation

被引:0
|
作者
Li, Qiming [1 ]
Cheng, Jinghang [1 ]
Gao, Yin [1 ]
Li, Jun [1 ]
机构
[1] Chinese Acad Sci, Haixi Inst, Quanzhou Inst Equipment Mfg, Lab Robot & Intelligent Syst, Quanzhou 362216, Fujian, Peoples R China
基金
中国国家自然科学基金;
关键词
Geometric information embedding; transformer; self-attention; motion segmentation; VIDEO OBJECT SEGMENTATION; MULTIPLE-STRUCTURE DATA; CONSENSUS; TRACKING; GRAPHS;
D O I
10.1109/TCSVT.2024.3382363
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the emergence of Vision Transformers, attention-based modules have demonstrated comparable or superior performance in comparison to CNNs on various vision tasks. However, limited research has been conducted to explore the potential of the self-attention module in learning the global and local geometric information for key-points based motion segmentation. This paper thus presents a new method, named GIET, that utilizes geometric information in the Transformer network for key-points based motion segmentation. Specifically, two novel local geometric information embedding modules are developed in GIET. Unlike the traditional convolution operators which model the local geometric information of key-points within a fixed-size spatial neighbourhood, we develop a Neighbor Embedding Module (NEM) by aggregating the feature maps of k-Nearest Neighbors (k-NN) for each point according to the semantics similarity between the input key-points. NEM not only augments the network's ability of local feature extraction of the points' neighborhoods, but also characterizes the semantic affinities between points in the same moving object. Furthermore, to investigate the geometric relationships between the points and each motion, a Centroid Embedding Module (CEM) is devised to aggregate the feature maps of cluster centroids that correspond to the moving objects. CEM can effectively capture the semantic similarity between points and the centroids corresponding to the moving objects. Subsequently, the multi-head self-attention mechanism is exploited to learn the global geometric information of all the key-points using the aggregated feature maps obtained from the two embedding modules. Compared to the convolution operators or self-attention mechanism, the proposed simple Transformer-like architecture can optimally utilize both the local and global geometric properties of the input sparse key-points. Finally, the motion segmentation task is formulated as a subspace clustering problem using the Transformer architecture. The experimental results on three motion segmentation datasets, including KT3DMoSeg, AdelaideRMF, and FBMS, demonstrate that GIET achieves state-of-the-art performance.
引用
收藏
页码:7856 / 7869
页数:14
相关论文
共 50 条
  • [21] Transformer-Based Deep Learning Network for Tooth Segmentation on Panoramic Radiographs
    Chen Sheng
    Lin Wang
    Zhenhuan Huang
    Tian Wang
    Yalin Guo
    Wenjie Hou
    Laiqing Xu
    Jiazhu Wang
    Xue Yan
    Journal of Systems Science and Complexity, 2023, 36 : 257 - 272
  • [22] Recognition and Analysis of an Age-Friendly Intelligent Sofa Design Based on Skeletal Key-Points
    Zhou, Chengmin
    Huang, Ting
    Luo, Xin
    Kaner, Jake
    Fu, Xiaoman
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (18)
  • [23] A fast image registration approach based on SIFT key-points applied to super-resolution
    Amintoosi, M.
    Fathy, M.
    Mozayani, N.
    IMAGING SCIENCE JOURNAL, 2012, 60 (04): : 185 - 201
  • [24] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
    Duan, Zaipeng
    Huang, Xiao
    Ma, Jie
    NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6361 - 6375
  • [25] Transformer-Based Cross-Modal Information Fusion Network for Semantic Segmentation
    Zaipeng Duan
    Xiao Huang
    Jie Ma
    Neural Processing Letters, 2023, 55 : 6361 - 6375
  • [26] Image Region Duplication Forgery Detection Based on Angular Radial Partitioning and Harris Key-Points
    Uliyan, Diaa M.
    Jalab, Hamid A.
    Wahab, Ainuddin W. Abdul
    Sadeghi, Somayeh
    SYMMETRY-BASEL, 2016, 8 (07):
  • [27] HIGF-Net: Hierarchical information-guided fusion network for polyp segmentation based on transformer and convolution feature learning
    Wang J.
    Tian S.
    Yu L.
    Zhou Z.
    Wang F.
    Wang Y.
    Computers in Biology and Medicine, 2023, 161
  • [28] SAR Ship Instance Segmentation With Dynamic Key Points Information Enhancement
    Gao, Fei
    Han, Xu
    Wang, Jun
    Sun, Jinping
    Hussain, Amir
    Zhou, Huiyu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 11365 - 11385
  • [29] Network video summarization based on key frame extraction via superpixel segmentation
    Jin, Haiyan
    Yu, Yang
    Li, Yumeng
    Xiao, Zhaolin
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2022, 33 (06):
  • [30] Key-points detection algorithm based on fusion of deep and shallow features for warship's vital part
    Li, Chenxuan
    Qian, Kun
    Xu, Huiqi
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2021, 43 (11): : 3239 - 3249