ETL Pipeline using “Data Science Job Salaries” Dataset

Overview

This project implements an ETL (Extract, Transform, Load) pipeline using “Data Science Job Salaries” dataset. The pipeline extracts salaries data, related job titles, and salaries trends data, transforms them into a single dataframe, and loads the transformed data into a database. Additionally, the project includes steps for various data visualisation using graphs.

Dependencies

The following packages have to be installed:

pandas
os
chardet from sqlalchemy
create_engine
matplotlib.pyplot as plt

Getting Started

To run the project, follow these steps:

Clone the repository to your local machine.
Open the project in any python environment.
Before running the code, make sure to delete the your_database.db if it exists in the project directory. This step helps avoid any potential writing permission errors.
Run the notebook cells to execute the ETL pipeline, save the data in DB and visualise the data with your tool of choice.

Data Sources

The following data sources were utilised for this project:

The ds_salaries.csv file was downloaded from Kaggle.

Acknowledgements

I want to express my sincere gratitude to Xander Talent for their unwavering kindness and exceptional support during my time at the academy. Their guidance, mentorship, and encouragement have been invaluable assets that not only helped me achieve my immediate objectives but also instilled in me the confidence and skills to pursue even higher aspirations in the future.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code for your own purposes.

Conclusion

This ETL pipeline provides a comprehensive solution for extracting, transforming, and loading salaries data. It combines salaries data, job titles, and location trends data into a single dataframe, enabling further analysis and visualisation. The project demonstrates how to import data from CSV, perform data cleaning and transformation, and leverage visualisation for deeper insights.

Please feel encouraged to personalise and improve the pipeline to align with your unique needs and scenarios. Enjoy your journey of data analysis!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
FurtherDocumentation		FurtherDocumentation
README.md		README.md
etlpipeline.py		etlpipeline.py
your_database.db		your_database.db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Pipeline using “Data Science Job Salaries” Dataset

Overview

Dependencies

Getting Started

Data Sources

Acknowledgements

License

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ETL Pipeline using “Data Science Job Salaries” Dataset

Overview

Dependencies

Getting Started

Data Sources

Acknowledgements

License

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages