First Steps Toward Synthetic Sample Generation for Machine Learning Based Flare Forecasting

被引:2
作者
Hostetter, Maxwell [1 ]
Angryk, Rafal A. [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年
基金
美国国家科学基金会;
关键词
statistical time series features; flare prediction; data generation; data mining; SMOTE;
D O I
10.1109/BigData50022.2020.9377986
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The imbalanced class problem is intrinsic to solar flare forecasting, as are other issues we find in data-driven forecasting problems that are often hidden within an imbalanced dataset. One method of dealing with imbalanced data is to balance the data by using synthetic oversampling to create synthetic examples of the minority class. Though synthetic oversampling techniques have been applied to problems in medicine, finance, security, and other areas, we have not seen these approaches used in solar flare forecasting. We investigate two methods of synthetic oversampling, Rapidly Converging Gibbs Sampler (RACOG) and Synthetic Minority Oversampling Technique (SMOTE). We devise three naive synthetic oversampling techniques for comparison. We rely on data provided by the Space Weather ANalytics for Solar Flares (SWAN- SF) benchmark dataset. Our results indicate that synthetic oversampling can be effective for machine learning based solar flare forecasting.
引用
收藏
页码:4208 / 4217
页数:10
相关论文
共 24 条
  • [1] Ahmadzadeh A, 2019, IEEE INT CONF BIG DA, P1423, DOI 10.1109/BigData47090.2019.9006505
  • [2] Multivariate time series dataset for space weather data analytics
    Angryk, Rafal A.
    Martens, Petrus C.
    Aydin, Berkay
    Kempton, Dustin
    Mahajan, Sushant S.
    Basodi, Sunitha
    Ahmadzadeh, Azim
    Cai, Xumin
    Filali Boubrahimi, Soukaina
    Hamdi, Shah Muhammad
    Schuh, Michael A.
    Georgoulis, Manolis K.
    [J]. SCIENTIFIC DATA, 2020, 7 (01)
  • [3] [Anonymous], PRACTICAL GUIDE SUPP
  • [4] [Anonymous], **DATA OBJECT**, DOI DOI 10.7910/DVN/EBCFKM
  • [5] TOWARD RELIABLE BENCHMARKING OF SOLAR FLARE FORECASTING METHODS
    Bloomfield, D. Shaun
    Higgins, Paul A.
    McAteer, R. T. James
    Gallagher, Peter T.
    [J]. ASTROPHYSICAL JOURNAL LETTERS, 2012, 747 (02)
  • [6] SOLAR FLARE PREDICTION USING SDO/HMI VECTOR MAGNETIC FIELD DATA WITH A MACHINE-LEARNING ALGORITHM
    Bobra, M. G.
    Couvidat, S.
    [J]. ASTROPHYSICAL JOURNAL, 2015, 798 (02)
  • [7] The Helioseismic and Magnetic Imager (HMI) Vector Magnetic Field Pipeline: SHARPs - Space-Weather HMI Active Region Patches
    Bobra, M. G.
    Sun, X.
    Hoeksema, J. T.
    Turmon, M.
    Liu, Y.
    Hayashi, K.
    Barnes, G.
    Leka, K. D.
    [J]. SOLAR PHYSICS, 2014, 289 (09) : 3549 - 3578
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [9] Chawla NV, 2004, ACM SIGKDD Explor. Newsl., V6, P1, DOI DOI 10.1145/1007730.1007733
  • [10] Chen Chao, 2004, Using random forest to learn imbalanced data