ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning

被引:0
|
作者
Kaltenborn, Julia [1 ,2 ]
Lange, Charlotte Emilie Elektra [1 ,2 ]
Ramesh, Venkatesh [1 ,2 ]
Brouillard, Philippe [1 ,2 ]
Gurwicz, Yaniv [3 ]
Nagda, Chandni [4 ]
Runge, Jakob [5 ,6 ]
Nowack, Peer [7 ]
Rolnick, David [1 ,2 ]
机构
[1] Mila Quebec AI Inst, Montreal, PQ, Canada
[2] Univ Montreal, Montreal, PQ, Canada
[3] Intel Labs, Hillsboro, OR USA
[4] Univ Illinois, Urbana, IL USA
[5] German Aerosp Ctr, Berlin, Germany
[6] Tech Univ Berlin, Berlin, Germany
[7] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
英国自然环境研究理事会; 欧洲研究理事会;
关键词
EARTH SYSTEM MODEL; VARIABILITY; EMISSIONS; SIMULATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a "super-emulator" can quickly project new climate change scenarios, complementing existing scenarios already provided to policymakers. We believe ClimateSet will create the basis needed for the ML community to tackle climate-related tasks at scale.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] Precipitation forecasting by large-scale climate indices and machine learning techniques
    Rostam, Mehdi Gholami
    Sadatinejad, Seyyed Javad
    Malekian, Arash
    JOURNAL OF ARID LAND, 2020, 12 (05) : 854 - 864
  • [2] Precipitation forecasting by large-scale climate indices and machine learning techniques
    Mehdi Gholami Rostam
    Seyyed Javad Sadatinejad
    Arash Malekian
    Journal of Arid Land, 2020, 12 : 854 - 864
  • [3] Precipitation forecasting by large-scale climate indices and machine learning techniques
    Mehdi GHOLAMI ROSTAM
    Seyyed Javad SADATINEJAD
    Arash MALEKIAN
    Journal of Arid Land, 2020, 12 (05) : 854 - 864
  • [4] Recovering large-scale battery aging dataset with machine
    Tang, Xiaopeng
    Liu, Kailong
    Li, Kang
    Widanage, Widanalage Dhammika
    Kendrick, Emma
    Gao, Furong
    PATTERNS, 2021, 2 (08):
  • [5] Predicting Heart Rate at the Anaerobic Threshold Using a Machine Learning Model Based on a Large-Scale Population Dataset
    Nakayama, Atsuko
    Iwata, Tomoharu
    Sakuma, Hiroki
    Kashino, Kunio
    Tomoike, Hitonobu
    JOURNAL OF CLINICAL MEDICINE, 2025, 14 (01)
  • [6] Code4ML: a large-scale dataset of annotated Machine Learning code
    Drozdova, Anastasia
    Trofimova, Ekaterina
    Guseva, Polina
    Scherbakova, Anna
    Ustyuzhanin, Andrey
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [7] Code4ML: a large-scale dataset of annotated Machine Learning code
    Drozdova A.
    Trofimova E.
    Guseva P.
    Scherbakova A.
    Ustyuzhanin A.
    PeerJ Computer Science, 2023, 9
  • [8] A Survey on Large-Scale Machine Learning
    Wang, Meng
    Fu, Weijie
    He, Xiangnan
    Hao, Shijie
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2574 - 2594
  • [9] Least Square Support Vector Machine for Large-scale Dataset
    Khanh Nguyen
    Trung Le
    Vinh Lai
    Duy Nguyen
    Dat Tran
    Ma, Wanli
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [10] Reproducing Reaction Mechanisms with Machine-Learning Models Trained on a Large-Scale Mechanistic Dataset
    Joung, Joonyoung F.
    Fong, Mun Hong
    Roh, Jihye
    Tu, Zhengkai
    Bradshaw, John
    Coley, Connor W.
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2024, 63 (43)