Separation performed for 5 sites of the BSRN (MNM, tropical; ASP, dry; CAR, temperate; BON, continental; SON, polar) with 1 year of data per site, totalling ≈1 Mill. samples.
A Python implementation of the GISPLIT global solar irradiance component-separation model for 1-min data, described in:
Ruiz-Arias, J.A. and Gueymard, C.A. (2024) GISPLIT: High-performance global solar irradiance component-separation model dynamically constrained by 1-min sky conditions. Solar Energy Vol. 269, 112363. doi: 10.1016/j.solener.2024.112363 (open access)
GISPLIT relies on the CAELUS sky-condition classification algorithm for 1-min GHI data to first separate the GHI input data into different sky situations and then apply specialized separation sub-models to each sky class. During the model's development, different versions were developed for each Koeppen-Geiger primary climate, for all climates combined, and using, or not, extreme gradient boosting for the most challenging sky situations, namely, scattered clouds and cloud enhancement events. Find further details in the GISPLIT's paper.
python3 -m pip install git+https://github.com/jararias/gisplit@mainTo test it and benchmark it against same-class state-of-the-art separation models, install the splitting_models package:
python3 -m pip install git+https://github.com/jararias/splitting_models@mainand run the tests as:
python3 -c "import splitting_models.tests as sm_tests; sm_tests.basic_test()"from the command-line interface or rather as:
import pylab as pl
import splitting_models.tests as sm_tests
sm_tests.basic_test()
pl.show()from a python script.
The first step is to create a GISPLIT instance:
from gisplit import GISPLIT
gs = GISPLIT(engine="reg", climate=None)It accepts two input arguments:
engine, which can be set to'reg'to use plain (conventional) regression models to separate the components in all sky situations, or'xgb'to use extreme gradient boosting models for scattered clouds and cloud enhancement events. Defaults to'xgb'.climate, to select a model version trained especifically for one of the primary Koeppen-Geiger climates (namely,'A','B','C','D'or'E'; 🔍), or set toNoneto use the all-climates version (default option).
Tip
The benefits of using climate-specific model versions are not clear (see the paper above). I recommend to use the all-climates version (i.e., climate=None). In any case, use a climate-wise version only if you know the Koeppen-Geiger primary climate of your targetted location.
Once the GISPLIT instance is created, the separation is as follows:
pred = gs.predict(data)where data is a Pandas DataFrame with the following mandatory 1-min time series variables (columns):
-
sza: the solar zenith angle, in degrees -
eth: the extraterrestrial solar irradiance, in W/m$^2$ -
ghi: the global horizontal irradiance, in W/m$^2$ -
ghics: the clear-sky global horizontal irradiance, in W/m$^2$ -
difcs: the clear-sky diffuse irradiance, in W/m$^2$
and, in addition, it requires:
-
longitude: the site's longitude, in degrees -
ghicda: the cloudless-and-clean-and-dry-sky global horizontal irradiance, in W/m$^2$
to evaluate the sky type from CAELUS. However, if the sky type is known (e.g., because it has been previously evaluated with CAELUS), longitude and ghicda won't be necessary if the sky type is passed to predict using the sky_type_or_func input argument.
For instance, if the sky type is available in the sky_type column of data, you could do:
pred = gs.predict(data, sky_type_or_func=data.sky_type.values)or, you could do:
pred = gs.predict(data, sky_type_or_func=lambda df: df.sky_type.values)The sky_type_or_func argument accepts a Pandas Series or numpy 1-d array of CAELUS sky types with the same length as data, or a function that returns the sky type and receives data as input.
Tip
If the sky type has been pre-computed, use it in predict because this accelerates the separation and eliminates the requirement of longitude and ghicda.
Tip
If you are sure that the sky type is aligned with the rest of values in data, use the .values attribute of sky_type to prevent dataframe index alignment issues that could be raised eventually.
Important
The data gaps in the input dataframe should be kept at a minimum because the sky-type classification and the components separation rely on variability indicators that are evaluated with centered moving windows. The data gaps may deteriorate the value of these indicators and, as a result, the model performance may be deteriorated as well.
The dataframe data index must be:
-
a Pandas DatetimeIndex in coordinated universal time (UTC), or
-
a Pandas MultiIndex with two levels:
times_utcandsite.
The multi-index option is devised to run the separation for several sites at once. The times_utc level is the same DatetimeIndex, in UTC, as in the "single-index" option, while site is a site's identifier. Internally, GISPLIT groups the input dataset by site (using the groupby Pandas' method), performs the separation site by site, and then combines the results.
Tip
When the sky type is pre-computed in "multi-index" datasets, the sky_type_or_func=lambda df: df.sky_type.values is useful to skip the re-evaluation of sky type and the need of longitude and ghicda.
The output of predict is a dataframe with exactly the same index as the input dataframe and the following columns:
-
dif: the diffuse irradiance, in W/m$^2$ -
dir: the direct horizontal irradiance, in W/m$^2$ -
dni: the direct normal irradiance, in W/m$^2$
