We propose a machine learning-based methodology suitable for use in regions where full availability of meteorological data is lacking. In response to the growing global energy demand and the need to reduce the carbon footprint generated by fossil fuels, solar energy has become the central renewable energy source through small and large-scale photovoltaic systems. Solar energy production depends on the available amount of solar irradiation in a given area, considering the influence of external factors such as environmental conditions, seasons, geographic location, and others. Many regions in the global south do not maintain an updated solar irradiance database, limiting an efficient solar potential analysis. Meteorological stations can provide high-precision ground measurements. However, such stations cover specific locations that cause a spatial data availability problem. This problem can be solved using satellite data, which provides zone-wise spatial information. However, the measurements made by satellites are not as accurate as those obtained on the ground. In this work, we opted for tree-based regression models to map satellite to meteorological stations' Global Horizontal Irradiation (GHI). Then, we propose using these data generated through regression for GHI forecasting to facilitate applications such as sizing and operating photovoltaic and solar thermal systems. We illustrate this methodology through a case study where we used the generated dataset along with an LSTM neural network showing better performance in forecasting short-term irradiance when compared to an statistical baseline.