A Joint-Training Two-Stage Method For Remote Sensing Image Captioning

被引:26
|
作者
Ye, Xiutiao [1 ]
Wang, Shuang [1 ]
Gu, Yu [1 ]
Wang, Jihui [1 ]
Wang, Ruixuan [1 ]
Hou, Biao [1 ]
Giunchiglia, Fausto [2 ]
Jiao, Licheng [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ China, Xian 710071, Peoples R China
[2] Univ Trento, Dept Informat Engn & Comp Sci, I-38123 Trento, Italy
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷
基金
中国国家自然科学基金;
关键词
Image captioning; image understanding; joint training; multilabel attributes; remote sensing image (RSI); MODELS;
D O I
10.1109/TGRS.2022.3224244
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
Compared with remote sensing image (RSI) captioning methods based on the traditional encoder-decoder model, two-stage RSI captioning methods include an auxiliary remote sensing task to provide prior information, which enables them to generate more accurate descriptions. In previous two-stage RSI captioning methods, however, the image captioning and the auxiliary remote sensing tasks are handled separately, which is time-consuming and ignores mutual interference between tasks. To solve this problem, we propose a novel joint-training two-stage (JTTS) RSI captioning method. We use multilabel classification to provide prior information, and we design a differentiable sampling operator to replace the traditional nondifferentiable sampling operation to index the multilabel classification result. In contrast to previous two-stage RSI captioning methods, our method can implement joint training, and the joint loss allows the error of the generated description to flow into the optimization of the multilabel classification via backpropagation. Specifically, we approximate the Heaviside step function with the steep logistic function to implement a differentiable sampling operator for the multilabel classification. We propose a dynamic contrast loss function for multilabel classification tasks to ensure that a certain margin is maintained between the probabilities of the positive label and the negative label during sampling. We design an attribute-guided decoder to filter the multilabel prior information obtained by the sampling operator to generate more accurate image captions. The results of extensive experiments show that the JTTS method achieves state-of-the-art performance on the RSI captioning dataset (RSICD), the University of California, Merced (UCM)-captions, and the Sydney-captions datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning
    Guo, Jie
    Li, Ze
    Song, Bin
    Chi, Yuhao
    REMOTE SENSING, 2024, 16 (11)
  • [2] Two-Stage Reranking for Remote Sensing Image Retrieval
    Tang, Xu
    Jiao, Licheng
    Emery, William J.
    Liu, Fang
    Zhang, Dan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (10): : 5798 - 5817
  • [3] Remote sensing image destriping with two-stage image decomposition network
    Shi, Yu
    Wu, Feiyan
    Guo, Jian
    Li, Xi
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2025, 46 (05) : 2136 - 2158
  • [4] A Two-Stage Spatiotemporal Fusion Method for Remote Sensing Images
    Sun, Yue
    Zhang, Hua
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2019, 85 (12): : 907 - 914
  • [5] A two-stage domain adaptive remote sensing image semantic segmentation network combined with self-training
    Luo, Zhenglian
    He, Lingmin
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 847 - 852
  • [6] A Two-Stage Pansharpening Method for the Fusion of Remote-Sensing Images
    Wang, Yazhen
    Liu, Guojun
    Zhang, Rui
    Liu, Junmin
    REMOTE SENSING, 2022, 14 (05)
  • [7] Two-Stage Object Detection Based on Deep Pruning for Remote Sensing Image
    Wang, Shengsheng
    Wang, Meng
    Zhao, Xin
    Liu, Dong
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 137 - 147
  • [8] A Two-Stage Deep Learning Registration Method for Remote Sensing Images Based on Sub-Image Matching
    Chen, Yuan
    Jiang, Jie
    REMOTE SENSING, 2021, 13 (17)
  • [9] Region Driven Remote Sensing Image Captioning
    Kumar, S. Chandeesh
    Hemalatha, M.
    Narayan, S. Badri
    Nandhini, P.
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 32 - 40
  • [10] WordSentence Framework for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (12): : 10532 - 10543