Generate the date dimension for a data warehouse, with country-specific holidays added using python holidays.
Before running the script, set up the python virtual environment:
python3 -m venv env && source env/bin/activate
Then install using your package manager of choice:
python3 -m pip install .
OR
poetry install
To generate the date dimension table, e.g. between Jan 1 and Dec 31 2022, in csv format, with a single boolean flag for US and BR holidays:
datedim_generate -s 2022-01-01 -e 2022-12-31 -c US BR -o csv
Alternatively, to generate the same date dimension table, but with separate columns for is_holiday_US, holiday_name_US, is_holiday_BR, and holiday_name_BR:
datedim_generate -s 2022-01-01 -e 2022-12-31 -c US BR -o csv --holiday_names_columns
The generated file can then be uploaded into Snowflake via the drag-and-drop interface, or imported into another database of your choice.
For more information about usage, run:
datedim_generate -h
Internally, the python script makes use of the polars package.
The schema of the date dimension table is based loosely on the DATE dimension in https://www.cs.umb.edu/~poneil/StarSchemaB.PDF, which is a standard date dimension table.
The following columns will always be generated:
| Column | Type | Description |
|---|---|---|
| datekey | int | YYYYMMDD representation of date |
| date_raw | date | The raw date |
| dayofweek | text | The full name of the day of the week (e.g. Monday) |
| dayofweek_short | text | The three-letter abbreviation of the day of the week (e.g. Mon) |
| month | text | The full name of the month (e.g. August) |
| year | int | The year, e.g. 2021 |
| yearmonthnum | int | YYYYMM representation of the date, e.g. 202406 |
| monthyear | text | The three-letter abbreviation of the month, plus the year, e.g. Aug2024 |
| daynuminweek | int | The number of the day of week, where 1=Monday and 7=Sunday |
| daynuminmonth | int | The day number in the month (for February 2, 1995 this would be 2) |
| daynuminyear | int | The day number in the year, from 1-366 (depending on leap year; for December 31, 2022 the value would be 365) |
| monthnuminyear | int | The month number in the year |
| iso_year | int | The year of the date in the ISO week numbering (may differ from year) |
| iso_weeknuminyear | int | The week number in the ISO week numbering |
| is_last_day_in_week | boolean | Indicates that this date is the last in the week |
| is_last_day_in_month | boolean | Indicates that this date is the last in the month |
| is_holiday | boolean | Indicates that this date is a holiday in at least one of the provided calendars |
| is_weekday | boolean | Indicates that this date is a weekday |
Additionally, the following columns will be generated for each country, if the --holiday_names_columns argument is provided:
| Column | Type | Description |
|---|---|---|
| is_holiday_<COUNTRY> | boolean | Indicates that this date is a holiday in the <COUNTRY> calendar. |
| holiday_name_<COUNTRY> | text | The name of the holiday in the <COUNTRY> calendar, if available. |
This also applies for financial holidays (e.g. ECB).