etl homework by leomeyer1908 · Pull Request #35 · HedgeApple/etl_homework

leomeyer1908 · 2024-01-26T04:44:58Z

My code begins by opening the file "homework.csv" by using the csv.reader function from the CSV library, which creates a reader object for the CSV file. Then, I extract both the header row and all the subsequent rows containing data from the reader object and pass them to the transform_data function. Additionally, I provide the list of output_headers, which are the desired headers from "example.csv" that each entry from the input CSV file should conform to.

Within the transform_data function, I first create a list of dictionaries. Each index in the list corresponds to a different item in "homework.csv", and each dictionary contains all the header values as keys, with the corresponding values for that key as the value for that header in the current item in the "homework.csv" file. This dictionary structure allows for O(1) retrieval of the necessary keys without the need to search for the index of the appropriate header in the next step, where we iterate through each row.

Next, I create a list of output items to store a similar list of dictionaries, but for the output items. I iterate through each row, mapping each output header key to a corresponding input key, and apply the necessary transformations as specified in the instructions. The only two transformations required from the instructions were converting UPC to EAN13 and transforming the currency. Converting UPC to EAN13 involved prepending the value with a '0' and adding dashes at the proper locations. Transforming the currency involved ensuring that empty entries were represented as "0.00", removing the dollar sign if present, removing commas, and rounding to 2 decimal places.

After completing all transformations, output_items contained a list of dictionaries corresponding to those of the input but with the output headers and the necessary transformations. I then use the output_headers to create a list of rows for each item and return that as output_data. Finally, I used a CSV writer object to first write the output header and then each row of output_data to the "formatted.csv" file as a CSV.

Created etl.py file

1ef43df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl homework#35

etl homework#35
leomeyer1908 wants to merge 1 commit intoHedgeApple:masterfrom
leomeyer1908:master

leomeyer1908 commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leomeyer1908 commented Jan 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant