An ETL script to transform the data per requirement and store it for the future process. This script is developed using python3.9 scripting language.
- The S3 bucket name needs to be defined and the files inside the bucket are accessible publicly(or by the company domain).
- The source file is being uploaded daily before 9 AM.
- The file which is used in this ETL script is static(provided with the problem statement) and the code is provided if one want to use the S3 bucket for pulling the file.
- The results are satored in csv format in the local system which will be used to start the ETL pipeline. (This result data can be uploaded to any kind of target environment including mysql, mongodb, AWS service etc.)
- The logs of the ETL pipeline will be stored in
logs.logfile, so user can see the logs anytime regardless of monitoring it.
To install dependecies, please run the following command in the terminal
pip install -r requirements.txtTo start the ETL script, run the following command
python main.pypytest is used to test the developed functions in the script. This unit-testing script tests the basic user-defined functions developed in the script. To start the testing, please run the following command in the terminal. This will automatically target the test script and executes it.
pytestRunning ETL script in endless mode can be achieved by setting up a cron-job in a system & setting up this script to run periodically. This can be achieved by running following command in the terminal.
crontab -e Select the editor of your choice. Add the command that you want to execute from cron and save that file.
0 9 * * * python ./main.py