pg_lake integrates with Data Build Tool (dbt) to update, insert, or transform data. Installing both the dbt-core (version 1.9.2+) and the database adapter dbt-postgres(version 1.9.0+) are required.
dbt runs externally over a Postgres connection. Superuser access might be needed depending on the number of changes expected.
my_dbt_project:
outputs:
dev:
type: postgres
host: atx5231hr.pgtest.us-west-2.aws.postgres.app
user: postgres
password: ....
port: 5432
dbname: postgres
schema: public
threads: 4
target: dev
S3 file location
While configuring dbt, you will need to create an environment variable to point dbt at S3, like this:
export ICEBERG_LOCATION_PREFIX=<S3 location>If you’re using the built-in Iceberg appliance for pg_lakee, your S3 bucket location can be accessed from psql with a show command, like this:
psql <connection-string> -c 'show pg_lake_iceberg.default_location_prefix'The model configuration controls how the transformation process behaves.
materialized='incremental': This tells dbt to perform incremental updates instead of fully rebuilding the table each time.unique_key='created_at': This specifies the unique identifier for each record, used to detect new records.pre_hookandpost_hook: These hooks are executed before and after the model runs. In this case, thepre_hooksets the default access method toicebergand configures the location prefix for storing Iceberg tables in S3. Thepost_hookresets these settings after the model has completed.
{{ config(
materialized='incremental',
unique_key='created_at',
pre_hook="SET default_table_access_method TO 'iceberg'; SET pg_lake_iceberg.default_location_prefix = '{{ env_var('ICEBERG_LOCATION_PREFIX', '') }}';",With dbt you can run the full range of features in the dbt-postgres adaptor for loading data and performing SQL operations in Postgres.
- Create and manage Postgres tables with pg_lake
- Load data from outside sources to Postgres
- Utilize incremental processing for new or updated data instead of full table refreshes.
MERGEactions supported
dbt can be especially helpful for adding data to Iceberg for fast analytics with pg_lake. With any data source, dbt can be configured to create and populate Iceberg table data. Included in the support for dbt with Iceberg is:
- Creating and managing Iceberg tables inside
pg_lake - Data loading from outside sources to Iceberg
- Incrementally processing and transforming data in Iceberg. This is done with a
delete+insertstrategy. Merge is not currently supported on Iceberg tables, but it can be used directly with the Postgres instance.