ETL Assignment Submission by gunjanmimo · Pull Request #23 · HedgeApple/etl_homework

gunjanmimo · 2023-06-03T16:59:28Z

This code defines a class called ETLPipeline that represents an ETL pipeline. It performs data transformation operations on a main DataFrame using a reference DataFrame and outputs a formatted DataFrame.

Here's a breakdown of the code:

The code imports the Pandas library, which is used for working with DataFrames.
The ETLPipeline class is defined. It takes two arguments in its constructor: main_dataframe and reference_dataframe, both of which are instances of Pandas's DataFrame.
The constructor initializes the class by assigning the main_dataframe and reference_dataframe arguments to the corresponding instance variables.
The get_column_name_mapping method returns a dictionary that maps column names from the main DataFrame to column names in the reference DataFrame. It serves as a mapping for transforming the column names during the transformation process.
The get_country_code method returns a dictionary that maps country names to their respective country codes. It is used for transforming the "country of origin" column in the DataFrame.
The upc_to_ean13_transform method is a helper function that takes a UPC value as input and converts it to the EAN13 format. It returns the converted value.
The price_value_transform method is a helper function that takes a price value as input and converts it to a formatted price value. It removes any currency symbols and commas, rounds the value to 2 decimal places, and adds a dollar sign. It returns the formatted price value.
The prop65_transform method takes a row from the DataFrame as input and checks if the url california label (jpg) and url california label (pdf) columns contain any NaN values. If either of the columns is NaN, it returns False; otherwise, it returns True. This method is used to determine whether a row indicates Prop65 compliance.
The transform method is the main transformation function. It creates a new DataFrame called flattened_dataframe with the same columns as the reference DataFrame. It iterates over the column name mapping and assigns values from the main DataFrame to the corresponding columns in the flattened_dataframe.
The method performs several transformations on specific columns:

It applies the upc_to_ean13_tranform function to convert the "ean13" column values to the EAN13 format.
It applies the price_value_transform function to format the "cost_price" and "min_price" columns.
It applies the prop65_transform function to determine the "prop_65" column values.
It replaces country names in the "product__country_of_origin__alpha_3" column with their respective country codes using the get_country_code dictionary.
It converts "yes" and "no" values in the "attrib__bulb_included" and "attrib__outdoor_safe" columns to True and False, respectively.
It checks if the string "ul" is present in the "attrib__ul_certified" column and converts it to True if present; otherwise, it converts it to False.

The __call__ method is implemented to make the class callable. It calls the transform method to obtain the formatted DataFrame and exports it to a CSV file named "formatted.csv". Finally, it returns True to indicate a successful ETL pipeline execution.
The __name__ == "__main__" block is used to execute the code when the script is run directly. It reads two CSV files, "homework.csv" and "example.csv", into DataFrames

gunjanmimo added 3 commits June 3, 2023 20:20

Code file and output Added

b44ff57

Doc string added

02c6082

Added requirements.txt

7cf1920

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETL Assignment Submission #23

ETL Assignment Submission #23
gunjanmimo wants to merge 3 commits intoHedgeApple:masterfrom
gunjanmimo:master

gunjanmimo commented Jun 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gunjanmimo commented Jun 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant