Satellite Component Semantic Segmentation: Video Dataset and Real-Time Pyramid Attention and Decoupled Attention Network

被引:4
作者
Shao, Yadong [1 ,2 ]
Wu, Aodi [2 ]
Li, Shengyang [3 ,4 ]
Shu, Leizheng [3 ,4 ]
Wan, Xue [3 ,4 ]
Shao, Yuanbin
Huo, Junyan [5 ]
机构
[1] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100049, Peoples R China
[2] Univ Chinese Acad Sci, Sch Aeronaut & Astronaut, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Technol & Engn Ctr Space Utilizat, Beijing 100094, Peoples R China
[4] Chinese Acad Sci, Key Lab Space Utilizat, Beijing 100094, Peoples R China
[5] Univ Warwick, Dept Comp Sci, Coventry CV4 8UW, England
关键词
Satellites; Semantic segmentation; Task analysis; Real-time systems; Streaming media; Solid modeling; Semantics;
D O I
10.1109/TAES.2023.3282608
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
High-accuracy and real-time satellite component semantic segmentation can locate the key satellite components, such as solar panels, to be operated in on-orbit services, which is of great significance for navigation and control. However, to accomplish the above aim, two main challenges remain unsolved. First, satellite component semantic segmentation algorithms require a large number of images for training; however, on-orbit satellite images are difficult to obtain, especially for a large-scale satellite component video dataset. In addition, high-accuracy semantic segmentation networks require relatively more computation resources, which are difficult to be fulfilled in on-orbit tasks. How to build a satellite component semantic segmentation network that meets the requirements of both high-accuracy and real-time on-orbit operation is the key aim to be accomplished in this article. In this article, a simulated satellite component dataset consisting of 98 video sequences of 13 satellites, with a complex background, various on-orbit illumination, and common satellite motion, is proposed, and it has 32 402 frames in total. To meet the requirements of both high-accuracy and real-time on-orbit operation, this article proposes an attention-based real-time network, Pyramid Attention and Decoupled Attention Network (PADAN), which contains an image-based version, PADAN-S, and a video-based version, PADAN-T. The PADAN-S, which mainly adopts pyramid attention calculation on three-layer pyramid features and then performs decoupled attention calculation by considering both row and column attention, is based on AttaNet. The PADAN-T uses a part of the PADAN-S to obtain temporal pyramid features from temporal frames, then performs decoupled attention calculations between the features of the output frame and the features at each layer in the temporal pyramid. The experimental results show that the PADAN-S and PADAN-T have superior performance compared to other real-time state-of-the-art algorithms in accuracy in both image-based and video-based satellite component semantic segmentation tasks on simulation datasets, and our dataset has a degree of simulating the real on-orbit environment. The PADAN-S can achieve a speed of 10.25 frames per second with an image solution of 1280 pixels x 720 pixels on the edge computing device Jetson Xavier, and the PADAN-T can obtain a speed of 7.18 frames per second.
引用
收藏
页码:7315 / 7333
页数:19
相关论文
共 34 条
  • [1] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [2] Bergen James R., 1984, RCA Engineer, V29, P33
  • [3] Deep Spatio-Temporal Random Fields for Efficient Video Segmentation
    Chandra, Siddhartha
    Couprie, Camille
    Kokkinos, Iasonas
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8915 - 8924
  • [4] Cheng HK, 2021, ADV NEUR IN, V34
  • [5] ECO: Efficient Convolution Operators for Tracking
    Danelljan, Martin
    Bhat, Goutam
    Khan, Fahad Shahbaz
    Felsberg, Michael
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6931 - 6939
  • [6] Ding MY, 2020, AAAI CONF ARTIF INTE, V34, P10713
  • [7] STFCN: Spatio-Temporal Fully Convolutional Neural Network for Semantic Segmentation of Street Scenes
    Fayyaz, Mohsen
    Saffar, Mohammad Hajizadeh
    Sabokrou, Mohammad
    Fathy, Mahmood
    Huang, Fay
    Klette, Reinhard
    [J]. COMPUTER VISION - ACCV 2016 WORKSHOPS, PT I, 2017, 10116 : 493 - 509
  • [8] A review of space robotics technologies for on-orbit servicing
    Flores-Abad, Angel
    Ma, Ou
    Pham, Khanh
    Ulrich, Steve
    [J]. PROGRESS IN AEROSPACE SCIENCES, 2014, 68 : 1 - 26
  • [9] Semantic Video CNNs through Representation Warping
    Gadde, Raghudeep
    Jampani, Varun
    Gehler, Peter V.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4463 - 4472
  • [10] A Spacecraft Dataset for Detection, Segmentation and Parts Recognition
    Hoang Anh Dung
    Chen, Bo
    Chin, Tat-Jun
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2012 - 2019