Ringmo-SenseV2: Remote Sensing Foundation Model for Spatiotemporal Prediction Based on Multisource Heterogeneous Time-Series Data

被引:0
作者
Xu, Liangyu [1 ,2 ,3 ]
Lu, Wanxuan [1 ,2 ]
Hu, Leiyi [1 ,2 ,3 ]
Yang, Heming [1 ,2 ,3 ]
Jiang, Yi [1 ,2 ,3 ]
Liu, Chenglong [1 ,2 ,3 ]
Yu, Hongfeng [1 ,2 ]
Deng, Chubo [1 ,2 ]
Sun, Xian [1 ,2 ,3 ]
Fu, Kun [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing 100094, Peoples R China
[2] Chinese Acad Sci, Aerosp Informat Res Inst, Key Lab Target Cognit & Applicat Technol, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
来源
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2025年 / 63卷
关键词
Foundation models; Videos; Trajectory; Spatiotemporal phenomena; Data models; Predictive models; Autonomous aerial vehicles; Time series analysis; Feature extraction; Earth; Foundation model; remote sensing (RS); spatiotemporal prediction;
D O I
10.1109/TGRS.2025.3565203
中图分类号
P3 [地球物理学]; P59 [地球化学];
学科分类号
0708 ; 070902 ;
摘要
The rapid development of remote sensing (RS) technology has generated a vast amount of heterogeneous time-series data from various sources, including drone videos, satellite time-series images, and multiobject trajectories. Effectively processing and analyzing this multisource heterogeneous data for accurate spatiotemporal prediction is crucial in fields such as environmental protection and disaster response. In this article, we propose a universal predictive foundation model named Ringmo-SenseV2 to learn the general evolutionary patterns of RS elements from massive heterogeneous data. Ringmo-SenseV2 features a mixture-of-heterogeneous-experts (MoHE) Transformer, which unifies the modeling of multisource heterogeneous time-series data. Additionally, to better capture the complex dependencies across different spatiotemporal locations, we introduce a hypergraph translator (HT), treating embeddings of different spatiotemporal locations as nodes and employing hypergraph convolution for information propagation. Furthermore, to enhance the model's adaptability to different evolution speeds during pretraining, we implement the adaptive tube masking (AM) strategy, which controls prediction difficulty by adaptively setting mask proportions for sequences with varying evolution speeds. Extensive experiments demonstrate that Ringmo-SenseV2 exhibits outstanding performance across various RS prediction tasks. Further tests on scene graph generation for RS images showcase the model's ability to extract image features, thereby enhancing image perception tasks.
引用
收藏
页数:18
相关论文
共 73 条
[1]   PreTR: Spatio-Temporal Non-Autoregressive Trajectory Prediction Transformer [J].
Achaji, Lina ;
Barry, Thierno ;
Fouqueray, Thibault ;
Moreau, Julien ;
Aioun, Francois ;
Charpillet, Francois .
2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, :2457-2464
[2]  
Altché F, 2017, IEEE INT C INTELL TR
[3]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[4]  
Bao H., 2021, P ADV NEUR INF PROC, P32897
[5]  
Bao HB, 2022, Arxiv, DOI arXiv:2206.01127
[6]  
Bozcan I, 2020, IEEE INT CONF ROBOT, P8504, DOI [10.1109/ICRA40945.2020.9196845, 10.1109/icra40945.2020.9196845]
[7]  
Chang MF, 2019, Arxiv, DOI arXiv:1911.02620
[8]  
Chen WH, 2021, PR MACH LEARN RES, V157, P454
[9]  
Cong YZ, 2022, ADV NEUR IN
[10]   Convolutional Social Pooling for Vehicle Trajectory Prediction [J].
Deo, Nachiket ;
Trivedi, Mohan M. .
PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :1549-1557