- Fork this project
- Create a python script that reads all of the rows from
homework.csvand outputs them to a new fileformatted.csvusing the headers fromexample.csvas a guideline. (SeeTransformationsbelow for more details.) - You may you any libraries you wish, but you must include a
requirements.txtif you import anything outside of the standard library. - There is no time limit for this assignment.
- You may ask any clarifying questions via email.
- Create a pull request against this repository with an English description of how your code works when you are complete
Follow industry standards for each data type when decided on the final format for cells.
- Dates should use ISO 8601
- Currency should be rounded to unit of accounting. Assume USD for currency and round to cents.
- For dimensions without units, assume inches. Convert anything which isn't in inches to inches.
- For weights without units, assume pounds. Convert anything which isn't in pounds to pounds.
- UPC / Gtin / EAN should be handled as strings
- Floating point and decimal numbers should preserve as much precision as possible
For this project, there's a main function that calls the Transform function, each line corrrespond to one "transformation". Different steps are considered since multiple updates are required. At a high level, the steps are the following:
- Read the csv files
- Rename columns according to a mapping (please check utility file where helpers are located)
- Countries are transformed into alpha3 (requirements.txt includes the library to do so)
- Datatype transformation, where mostly datatypes are corrected, missing values are filled and more.
- Added new columns to match the expected format for the output file
- EAN13 column, needed a specific formatting so I added this here, it can be improved to be included within the datatype transformation method.
As a final step, it generates a csv in the same working directory as the input files called "formatted.csv" with the output data.