Skip to content

Conversation

@muhammad-ammar
Copy link
Member

@muhammad-ammar muhammad-ammar commented Oct 17, 2023

JIRA: https://2u-internal.atlassian.net/browse/ENT-7602

Description: Changes in this PR are based on col_name_or_user_var option. Please see https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html. load_s3_data_to_mysql task loads data from csv files into mysql tables. Currently for a correct transfer of data, the order of columns in csv file must match with table columns order in toml file in prefect-flows. If a new field is added in csv in the middle or start of existing columns but that field is not present in table schema in toml file in prefect-flows, this will cause incorrect data transfer.

Changes in this PR will

  • Raise an exception if we can't transfer data because a new field in csv is in the middle or start.
  • Allow to load data without having to make the order same in csv and prefect-flows toml file

Release Plan:

For now the new implementation is disabled and we are just logging details about the new changes.

We will release whole work in multiple releases.

  • First release: We will check logs and see if the things are working as expected

  • Second release:

  • Third release: Enable the new feature to transfer data from csv files into mysql without considering order of columns

if load_in_order:
table_column_names = [name for name, __ in table_columns]
columns_to_load = get_columns_load_order(s3_url, table, table_column_names)
columns_load_order = '( {} )'.format(', '.join(columns_to_load))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to Reviewers: columns_load_order will be added into LOAD DATA command below but for now I am planning to merge the changes as it is. I will check the logs to see if the current changes are working or not?

@muhammad-ammar muhammad-ammar force-pushed the ammar/load-data-irrespective-of-column-ordering branch 3 times, most recently from 0be1916 to cc9b1cf Compare October 18, 2023 10:42
@muhammad-ammar muhammad-ammar force-pushed the ammar/load-data-irrespective-of-column-ordering branch from cc9b1cf to ad735e7 Compare October 19, 2023 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants