Text-Vision Relationship Alignment for Referring Image Segmentation

被引:0
作者
Mingxing Pu
Bing Luo
Chao Zhang
Li Xu
Fayou Xu
Mingming Kong
机构
[1] Xihua University,School of Computer and Software Engineering
[2] Sichuan Police College,Key Laboratory of Intelligent Policing
[3] Xihua University,School of Science
来源
Neural Processing Letters | / 56卷
关键词
Semantic parsing; Text-vision alignment; Referring image segmentation;
D O I
暂无
中图分类号
学科分类号
摘要
Referring image segmentation aims to segment object in an image based on a referring expression. Its difficulty lies in aligning expression semantics with visual instances. The existing methods based on semantic reasoning are limited by the performance of external syntax parser and do not explicitly explore the relationships between visual instances. This article proposes an end-to-end method for referring image segmentation by aligning ’linguistic relationship’ with ’visual relationships’. This method does not rely on external syntax parser for expression parsing. In this paper, the expression is adaptively and structurally parsed into three components: ’subject’, ’object’, and ’linguistic relationship’ by the Semantic Component Parser (SCP) in a learnable manner. Instances Activation Map Module (IAM) locates multiple visual instances based on the subject and object. In addition, the Relationship Based Visual Localization Module (RBVL) firstly enables each instance of the image to learn global knowledge, then decodes the visual relationships between these visual instances, and finally aligns the visual relationships with the linguistic relationships to further accurately locate the target object. The experimental results show that the proposed method improves performance by 4– 9% compared with baseline method on multiple referring image segmentation datasets.
引用
收藏
相关论文
共 46 条
[1]  
Lin L(2022)Structured attention network for referring image segmentation IEEE Trans Multimed 24 1922-1932
[2]  
Yan P(2021)Cross-modal progressive comprehension for referring segmentation IEEE Trans Pattern Anal Mach Intell 44 4761-4775
[3]  
Xu X(2020)Dual convolutional LSTM network for referring image segmentation IEEE Trans Multimed 22 3224-3235
[4]  
Yang S(2021)Query reconstruction network for referring expression image segmentation IEEE Trans Multimed 23 995-1007
[5]  
Zeng K(2020)Referring image segmentation by generative adversarial learning IEEE Trans Multimed 22 1333-1344
[6]  
Li G(2022)Local-global context aware transformer for language-guided video segmentation IEEE Trans Pattern Anal Mach Intell 45 10055-10069
[7]  
Liu S(1997)Long short-term memory Neural Comput 9 1735-1780
[8]  
Hui T(2016)Deeplab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFS IEEE Trans Pattern Anal Mach Intell 40 834-848
[9]  
Huang S(2010)The pascal visual object classes (VOC) challenge Int J Comput Vision 88 303-338
[10]  
Wei Y(2017)Mask r-CNN IEEE Trans Pattern Anal Mach Intell 42 386-397