Conversation
| with open(filename, 'r', newline='') as file: | ||
| reader = csv.reader(file) | ||
| for row in reader: | ||
| data.append(row) |
There was a problem hiding this comment.
This could be a generator and just yield row to avoid building out a potentially large in-memory data structure.
|
|
||
| return data | ||
|
|
||
| def write_csv(self, filename: str, data: List[List[Any]]) -> None: |
There was a problem hiding this comment.
You can use Iterable[list[object]] because Iterable is contravariant to list
| for column_name in self.input_file_headers: | ||
|
|
||
| # Get the input column name corresponding to the output column name | ||
| input_column_index = list(self.columns_mapping.keys()).index(column_name) |
There was a problem hiding this comment.
Move out this list() call to an outer loop and assign to a temporary variable to avoid code in tight loop
| # Get the input column name corresponding to the output column name | ||
| input_column_index = list(self.columns_mapping.keys()).index(column_name) | ||
|
|
||
| if list(self.columns_mapping.values())[input_column_index] is None: |
There was a problem hiding this comment.
Same here. list() variable should be reused versus calls in hot path.
|
|
||
| transformed_value = self.transform_column_value(column_name, value) | ||
| transformed_row.append(transformed_value) | ||
| transformed_data.append(transformed_row) |
There was a problem hiding this comment.
Again, this could be a generator using yield to avoid building out a large intermediate data structure
| elif unit == 'feet': | ||
| # 1 feet equal to 12 inches | ||
| rounded_amount = round(float(value) * 12, 2) | ||
| return str(rounded_amount) |
There was a problem hiding this comment.
There's a dangling condition if unit != "cm" and unit != "feet", this returns None without an error
| - The value of the measurement converted to inches. | ||
| """ | ||
| if value: | ||
| if unit == 'cm': |
There was a problem hiding this comment.
Comparison is case sensitive.
If unit == 'CM' (uppercase), this will return None without an error
| Returns: | ||
| - The value of the measurement converted to pounds. | ||
| """ | ||
| if unit == 'kg': |
There was a problem hiding this comment.
Same issues as dimensions
ETL Service
Overview
This Python project implements an Extract, Transform, Load (ETL) service that processes data from CSV files. It reads data from an input CSV file, applies transformations based on specified rules, and then writes the transformed data to an output CSV file.
Features
coverage run -m pytest ..How to Use
Usage
Prepare your input CSV file containing the raw data to be processed.
Define a columns mapping file in JSON format. This file specifies how each column in the input file should be transformed. An example columns mapping file might look like this:
{ "system creation date": "date", "wholesale ($)": "wholesale_price", "item width (cm)": "width_inches", "item length (feet)": "length_inches", "item weight (kg)": "weight_pounds", "upc": "upc_code" }Replace
input.csvwith the path to your input CSV file,output.csvwith the desired path for the output CSV file, andcolumns_mapping.jsonwith the path to your columns mapping file.Example
Let's say we have an input CSV file
data.csvcontaining the following data:And a columns mapping file
columns_mapping.jsonas shown above.Running the ETL service with the following command:
Will result in a new CSV file
transformed_data.csvwith the following data: