Changelog

1.3.1

Feature:

Dataframe operator would now allow a user to either append to a table or replace a table with if_exists parameter. #1379

Bug fixes

Fix the aql.cleanup() operator as failing as the attribute output was implemented in 2.4.0 #1359
Fix the backward compatibility with apache-airflow-providers-snowflake==4.0.2. #1351
LoadFile operator returns a dataframe if not using XCom backend.#1348,#1337
Fix the functionality to create region specific temporary schemas when they don't exist in same region. #1369

Docs

Cross-link to API reference page from Operators page.#1383

Misc

Improve the integration tests to count the number of rows impacted for database operations. #1273
Run python-sdk tests with airflow 2.5.0 and fix the CI failures. #1232, #1351,#1317, #1337
Deprecate export_file before renaming to export_table_to_file. #1411

1.3.0

Feature:

Remove the need to use a custom Xcom backend for storing dataframes when Xcom pickling is disabled. #1334, #1331,#1319

Add support to Google Drive to be used as FileLocation . Example to load file from Google Drive to Snowflake #1044

aql.load_file(
    input_file=File(
        path="gdrive://sample-google-drive/sample.csv", conn_id="gdrive_conn"
    ),
    output_table=Table(
        conn_id=SNOWFLAKE_CONN_ID,
        metadata=Metadata(
            database=os.environ["SNOWFLAKE_DATABASE"],
            schema=os.environ["SNOWFLAKE_SCHEMA"],
        ),
    ),
)

Improvements

Use DefaultExtractor from OpenLineage. Users need not set environment variable OPENLINEAGE_EXTRACTORS to use OpenLineage. #1223, #1292
Generate constraints file for multiple Python and Airflow version that display the set of "installable" constraints for a particular Python (3.7, 3.8, 3.9) and Airflow version (2.2.5, 2.3.4, 2.4.2) #1226
Improve the logs in case native transfers fallbacks to Pandas as well as fallback indication in LoadFileOperator. #1263

Bug fixes

Temporary tables should be cleaned up, even with mapped tasks via aql.cleanup() #963
Update the name and namespace as per Open Lineage new conventions introduced here. #1281
Delete the Snowflake stage when LoadFileOperator fails. #1262

Docs

Update the documentation for Google Drive support. #1044
Update the documentation to remove the environment variable OPENLINEAGE_EXTRACTORS to use OpenLineage. #1292

Misc

Fix the GCS path in aql.export_file in the example DAGs. #1339

1.2.3

Bug fixes

When if_exists is set to replace in Dataframe operator, replace the table rather than append. This change fixes a regression on the Dataframe operator which caused it to append content to an output table instead of replacing. #1260
Pass the table metadata database value to the underlying airflow PostgresHook instead of schema as schema is renamed to database in airflow as per this PR. #1276

Docs

Include description on pickling and usage of custom Xcom backend in README.md #1203

Misc

Investigate and fix tests that are filling up Snowflake database with tmp tables as part of our CI execution. #738

1.2.2

Bug fixes

Make openlineage an optional dependency #1252
Update snowflake-sqlalchemy version #1228
Raise error if dataframe is empty #1238
Raise error db mismatch of operation #1233
Pass task_id to be used for parent class on LoadFileOperator init #1259

1.2.1

Feature:

Add support for Minio #750
Open Lineage support - Add Extractor for ExportFileOperator, DataframeOperator #903, #1183

Bug fixes

Add check for missing conn_id on transform operator. #1152
Raise error when copy into query fails in snowflake. #890
Transform op - database/schema is not picked from table's metadata. #1034

Improvement:

Change the namespace for Open Lineage #1179
Add LOAD_FILE_ENABLE_NATIVE_FALLBACK config to globally disable native fallback #1089
Add OPENLINEAGE_EMIT_TEMP_TABLE_EVENT config to emit events for tmp table in Open Lineage. #1121
Fix issue with fetching table row count for snowflake #1145
Generate unique Open Lineage namespace for Sqlite based operations #1141

Docs

Include section in docs to cover file pattern for native path of GCS to Bigquery . #800
Add guide for Open Lineage integration with Astro Python SDK #1116

Misc

Pin SQLAlchemy version to >=1.3.18,<1.4.42 #1185

1.2.0

Feature:

Remove dependency on AIRFLOW__CORE__ENABLE_XCOM_PICKLING. Users can set new environment variables, namely AIRFLOW__ASTRO_SDK__XCOM_STORAGE_CONN_ID and AIRFLOW__ASTRO_SDK__XCOM_STORAGE_URL and use a custom XCOM backend namely, AstroCustomXcomBackend which enables the XCOM data to be saved to an S3 or GCS location. #795, #997
Added OpenLineage support for LoadFileOperator , AppendOperator , TransformOperator and MergeOperator #898, #899, #902, #901 and #900
Add TransformFileOperator that
- parses a SQL file with templating
- applies all needed parameters
- runs the SQL to return a table object to keep the aql.transform_file function, the function can return TransformFileOperator().output in a similar fashion to the merge operator. #892
Add the implementation for row count for BaseTable. #1073

Improvement:

Improved handling of snowflake identifiers for smooth experience with dataframe and run_raw_sql and load_file operators. #917, #1098
Fix transform_file to not depend on transform decorator #1004
Set the CI to run and publish benchmark reports once a week #443
Fix cyclic dependency and improve import time. Reduces the import time for astro/databases/__init__.py from 23.254 seconds to 0.062 seconds #1013

Docs

Create GETTING_STARTED.md #1036
Document the Open Lineage facets published by Astro Python SDK. #1086
Documentation changes to specify permissions needed for running BigQuery jobs. #896
Document the details on custom XCOM. #1100
Document the benchmarking process. #1017
Include a detailed description on the default Dataset concept in Astro Python SDK. #1092

Misc

NFS volume mount in Kubernetes to test benchmarking from local to databases. #883

1.1.1

Improvements

Add filetype when resolving path in case of loading into dataframe #881

Bug fixes

Fix postgres performance regression (example from one_gb file - 5.56min to 1.84min) #876

1.1.0

Features

Add native autodetect schema feature #780
Allow users to disable auto addition of inlets/outlets via airflow.cfg #858
Add support for Redshift #639, #753, #700

Support for Datasets introduced in Airflow 2.4 #786, #808, #862, #871

inlets and outlets will be automatically set for all the operators.

Users can now schedule DAGs on File and Table objects. Example:

input_file = File(
    path="https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb_v2.csv"
)
imdb_movies_table = Table(name="imdb_movies", conn_id="sqlite_default")
top_animations_table = Table(name="top_animation", conn_id="sqlite_default")
START_DATE = datetime(2022, 9, 1)


@aql.transform()
def get_top_five_animations(input_table: Table):
    return """
        SELECT title, rating
        FROM {{input_table}}
        WHERE genre1='Animation'
        ORDER BY rating desc
        LIMIT 5;
    """


with DAG(
    dag_id="example_dataset_producer",
    schedule=None,
    start_date=START_DATE,
    catchup=False,
) as load_dag:
    imdb_movies = aql.load_file(
        input_file=input_file,
        task_id="load_csv",
        output_table=imdb_movies_table,
    )

with DAG(
    dag_id="example_dataset_consumer",
    schedule=[imdb_movies_table],
    start_date=START_DATE,
    catchup=False,
) as transform_dag:
    top_five_animations = get_top_five_animations(
        input_table=imdb_movies_table,
        output_table=top_animations_table,
    )

Dynamic Task Templates: Tasks that can be used with Dynamic Task Mapping (Airflow 2.3+)
- Get list of files from a Bucket - get_file_list #596
- Get list of values from a DB - get_value_list #673, #867
Create upstream_tasks parameter for dependencies independent of data transfers #585

Improvements

Avoid loading whole file into memory with load_operator for schema detection #805
Directly pass the file to native library when native support is enabled #802
Create file type for patterns for schema auto-detection #872

Bug fixes

Add compat module for typing execute context in operators #770
Fix sql injection issues #807
Add response_size to run_raw_sql and warn about db thrashing #815

Docs

Update quick start example #819
Add links to docs from README #832
Fix Astro CLI doc link #842
Add configuration details from settings.py #861
Add section explaining table metadata #774
Fix docstring for run_raw_sql #817
Add missing docs for Table class #788
Add the readme.md example dag to example dags folder #681
Add reason for enabling XCOM pickling #747

1.0.2

Bug fixes

Skip folders while processing paths in load_file operator when file pattern is passed. #733

Misc

Limit Google Protobuf for compatibility with bigquery client. #742

1.0.1

Bug fixes

Added a check to create table only when if_exists is replace in aql.load_file for snowflake. #729
Fix the file type for NDJSON file in Data transfer job in AWS S3 to Google BigQuery. #724
Create a new version of imdb.csv with lowercase column names and update the examples to use it, so this change is backwards-compatible. #721, #727
Skip folders while processing paths in load_file operator when file patterns is passed. #733

Docs

Updated the Benchmark docs for GCS to Snowflake and S3 to Snowflake of aql.load_file #712 #707
Restructured the documentation in the project.toml, quickstart, readthedocs and README.md #698, #704, #706
Make astro-sdk-python compatible with major version of Google Providers. #703

Misc

Consolidate the documentation requirements for sphinx. #699
Add CI/CD triggers on release branches with dependency on tests. #672

1.0.0

Features

Improved the performance of aql.load_file by supporting database-specific (native) load methods. This is now the default behaviour. Previously, the Astro SDK Python would always use Pandas to load files to SQL databases which passed the data to worker node which slowed the performance. #557, #481

Introduced new arguments to aql.load_file:
- use_native_support for data transfer if available on the destination (defaults to use_native_support=True)
- native_support_kwargs is a keyword argument to be used by method involved in native support flow.
- enable_native_fallback can be used to fall back to default transfer(defaults to enable_native_fallback=True).
Now, there are three modes:
- Native: Default, uses Bigquery Load Job in the case of BigQuery and Snowflake COPY INTO using external stage in the case of Snowflake.
- Pandas: This is how datasets were previously loaded. To enable this mode, use the argument use_native_support=False in aql.load_file.
- Hybrid: This attempts to use the native strategy to load a file to the database and if native strategy(i) fails , fallback to Pandas (ii) with relevant log warnings. #557
Allow users to specify the table schema (column types) in which a file is being loaded by using table.columns. If this table attribute is not set, the Astro SDK still tries to infer the schema by using Pandas (which is previous behaviour).#532
Add Example DAG for Dynamic Map Task with Astro-SDK. #377,airflow-2.3.0

Breaking Change

The aql.dataframe argument identifiers_as_lower (which was boolean, with default set to False) was replaced by the argument columns_names_capitalization (string within possible values ["upper", "lower", "original"], default is lower).#564
The aql.load_file before would change the capitalization of all column titles to be uppercase, by default, now it makes them lowercase, by default. The old behaviour can be achieved by using the argument columns_names_capitalization="upper". #564
aql.load_file attempts to load files to BigQuery and Snowflake by using native methods, which may have pre-requirements to work. To disable this mode, use the argument use_native_support=False in aql.load_file. #557, #481
aql.dataframe will raise an exception if the default Airflow XCom backend is being used. To solve this, either use an external XCom backend, such as S3 or GCS or set the configuration AIRFLOW__ASTRO_SDK__DATAFRAME_ALLOW_UNSAFE_STORAGE=True. #444
Change the declaration for the default Astro SDK temporary schema from using AIRFLOW__ASTRO__SQL_SCHEMA to AIRFLOW__ASTRO_SDK__SQL_SCHEMA #503
Renamed aql.truncate to aql.drop_table #554

Bug fixes

Fix missing airflow's task terminal states to CleanupOperator #525
Allow chaining aql.drop_table (previously truncate) tasks using the Task Flow API syntax. #554, #515

Enhancements

Improved the performance of aql.load_file for files for below:
- From AWS S3 to Google BigQuery up to 94%. #429, #568
- From Google Cloud Storage to Google BigQuery up to 93%. #429, #562
- From AWS S3/Google Cloud Storage to Snowflake up to 76%. #430, #544
- From GCS to Postgres in K8s up to 93%. #428, #531
Get configurations via Airflow Configuration manager. #503
Change catching ValueError and AttributeError to DatabaseCustomError #595
Unpin pandas upperbound dependency #620
Remove markupsafe from dependencies #623
Added extend_existing to Sqla Table object #626
Move config to store DF in XCom to settings file #537
Make the operator names consistent #634
Use exc_info for exception logging #643
Update query for getting bigquery table schema #661
Use lazy evaluated Type Annotations from PEP 563 #650
Provide Google Cloud Credentials env var for bigquery #679
Handle breaking changes for Snowflake provide version 3.2.0 and 3.1.0 #686

Misc

Allow running tests on PRs from forks + label #179
Standardize language in docs files #678

0.11.0

Feature:

Added Cleanup operator to clean temporary tables #187 #436

Internals:

Added a Pull Request template #205
Added sphinx documentation for readthedocs #276 #472

Enhancement:

Fail LoadFileOperator operator when input_file does not exist #467
Create scripts to launch benchmark testing to Google cloud #432
Bump Google Provider for google extra #294

0.10.0

Feature:

Allow list and tuples as columns names in Append & Merge Operators #343, #435

Breaking Change:

aql.merge interface changed. Argument merge_table changed to target_table, target_columns and merge_column combined to column argument, merge_keys is changed to target_conflict_columns, conflict_strategy is changed to if_conflicts. More details can be found at 422, #466

Enhancement:

Document (new) load_file benchmark datasets #449
Made improvement to benchmark scripts and configurations #458, #434, #461, #460, #437, #462
Performance evaluation for loading datasets with Astro Python SDK 0.9.2 into BigQuery #437

0.9.2

Bug fix:

Change export_file to return File object #454.

0.9.1

Bug fix:

Table unable to have Airflow templated names #413

0.9.0

Enhancements:

Introduction of the user-facing Table, Metadata and File classes

Breaking changes:

The operator save_file became export_file
The tasks load_file, export_file (previously save_file) and run_raw_sql should be used with use Table, Metadata and File instances
The decorators dataframe, run_raw_sql and transform should be used with Table and Metadata instances
The operators aggregate_check, boolean_check, render and stats_check were temporarily removed
The class TempTable was removed. It is possible to declare temporary tables by using Table(temp=True). All the temporary tables names are prefixed with _tmp_. If the user decides to name a Table, it is no longer temporary, unless the user enforces it to be.
The only mandatory property of a Table instance is conn_id. If no metadata is given, the library will try to extract schema and other information from the connection object. If it is missing, it will default to the AIRFLOW__ASTRO__SQL_SCHEMA environment variable.

Internals:

Major refactor introducing Database, File, FileType and FileLocation concepts.

0.8.4

Enhancements:

Add support for Airflow 2.3 #367.

Breaking change:

We have renamed the artifacts we released to astro-sdk-python from astro-projects. 0.8.4 is the last version for which we have published both astro-sdk-python and astro-projects.

0.8.3

Bug fix:

Do not attempt to create a schema if it already exists #329.

0.8.2

Bug fix:

Support dataframes from different databases in dataframe operator #325

Enhancements:

Add integration testcase for SqlDecoratedOperator to test execution of Raw SQL #316

0.8.1

Bug fix:

Snowflake transform without input_table #319

0.8.0

Feature:

*load_file support for nested NDJSON files #257

Breaking change:

aql.dataframe switches the capitalization to lowercase by default. This behaviour can be changed by using identifiers_as_lower #154

Documentation:

Fix commands in README.md #242
Add scripts to auto-generate Sphinx documentation

Enhancements:

Improve type hints coverage
Improve Amazon S3 example DAG, so it does not rely on pre-populated data #293
Add example DAG to load/export from BigQuery #265
Fix usages of mutable default args #267
Enable DeepSource validation #299
Improve code quality and coverage

Bug fixes:

Support gcpbigquery connections #294
Support params argument in aql.render to override SQL Jinja template values #254
Fix aql.dataframe when table arg is absent #259

Others:

Refactor integration tests, so they can run across all supported databases #229, #234, #235, #236, #206, #217

0.7.0

Feature:

load_file to a Pandas dataframe, without SQL database dependencies #77

Documentation:

Simplify README #101
Add Release Guidelines #160
Add Code of Conduct #101
Add Contribution Guidelines #101

Enhancements:

Add SQLite example #149
Allow customization of task_id when using dataframe #126
Use standard AWS environment variables, as opposed to AIRFLOW__ASTRO__CONN_AWS_DEFAULT #175

Bug fixes:

Fix merge XComArg support #183
Fixes to load_file:
- file_conn_id support #137
- sqlite_default connection support #158
Fixes to render:
- conn_id are optional in SQL files #117
- database and schema are optional in SQL files #124
Fix transform, so it works with SQLite #159

Others:

Remove transform_file #162
Improve integration tests coverage #174

0.6.0

Features:

Support SQLite #86
Support users who can't create schemas #121
Ability to install optional dependencies (amazon, google, snowflake) #82

Enhancements:

Change render so it creates a DAG as opposed to a TaskGroup #143
Allow users to specify a custom version of snowflake_sqlalchemy #127

Bug fixes:

Fix tasks created with dataframe so they inherit connection id #134
Fix snowflake URI issues #102

Others:

Run example DAGs as part of the CI #114
Benchmark tooling to validate performance of load_file #105

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

1.3.1

Feature:

Bug fixes

Docs

Misc

1.3.0

Feature:

Improvements

Bug fixes

Docs

Misc

1.2.3

Bug fixes

Docs

Misc

1.2.2

Bug fixes

1.2.1

Feature:

Bug fixes

Improvement:

Docs

Misc

1.2.0

Feature:

Improvement:

Docs

Misc

1.1.1

Improvements

Bug fixes

1.1.0

Features

Improvements

Bug fixes

Docs

1.0.2

Bug fixes

Misc

1.0.1

Bug fixes

Docs

Misc

1.0.0

Features

Breaking Change

Bug fixes

Enhancements

Misc

0.11.0

0.10.0

0.9.2

0.9.1

0.9.0

0.8.4

0.8.3

0.8.2

0.8.1

0.8.0

0.7.0

0.6.0