Skip to content

Adesoyin/data-engineering-toolkit

Repository files navigation

Data-Engineering-Toolkit Execise

Purpose of the project

This project is designed to strengthen my understanding and practical skills in using Git and GitHub. It covers:

  1. Repository creation and setup.

  2. Branching strategy for organized development.

  3. Feature branch development for building ETL process scripts.

  4. Pull requests and merge requests to integrate changes into the main branch.

Through this project, I aim to gain a deeper grasp of the workflow, logic, and best practices involved in Git and GitHub collaboration.

Documentation

Repository Creation

The first step carried out was the craetion of a repository named "data-engineering-toolkit" ensuring the README.md file is turned on.

alt text

The https link was copied out and then used to clone the repository on my local machine. A .gitignore file was created in the main folder which was used to hide all virtual environment and .pyc files to avoid public viewing of secrets.

alt text

Branching Strategy

The git flow branching stategy was adopted by creating different branches for each task/new feature. In my README.md, documentation was written to make the first commit to the remote repo. New branches (Develop and feature/branch-name) were created and on my github, I ensured branch protection rules were created on main branch to require pull request before merging.

alt text

The first wriiten documentation was added and pushed to Develop branch, it created a pull request on the remote repo, and changes was checked before being merged to the main branch.

alt text

Decription of the changes added and comments applied.

alt text

Data Cleaning Branch (feature/datacleaning branch)

The new feature where the data ingestion and cleaning was created and a new file called datacleaning.py was created to write the cleaning script.

The file was added to staging, commited and pushed to the remote repo feature/datacleaning branch.

git branch feature/datacleaning

git checkout feature/datacleaning

touch datacleaning.py

git add datacleaning.py

alt text

Data Transformation Branch (feature/datatransform branch)

The new feature branch where the data transformation is being carried out was also created and a new file called datatransform.py was created to write the transformation script.

The file was added to staging, commited and pushed to the remote repo feature/datatransform branch as well.

git branch feature/datatransform

git checkout feature/datatransform

touch datatransform.py

git add datatransform.py

alt text

Pull request and merging to Develop branch

alt text

Develop branch with update to be moved to main branch

alt text

Compare and pull request to the main branch from Develop branch.

alt text

Data Loading Branch (feature/dataload branch)

A new feature was created to host the .py file that loads the transformed data into its destination. The added file was added to staging, commited and pushed to the remote repo feature/dataloadbranch.

git branch feature/dataload

git checkout feature/dataload

touch dataload.py

git add dataload.py

git commit -m "added the load.py script file to merge to Develop branch"

git push origin feature/dataload

alt text

alt text

alt text

Pull Request and Code Review Process

For each push request from the feature branch to the remote repo, a pull request was created and reviews done in the remote repo, a description and code images attached before being merged into the Develop branch. Periodically, I merged the develop branch into main branch.

alt text

alt text

git Code written

To create virtual environment

python -m venv venv

To activate the environment

source venv/Scripts/activate

To see status of the repo

git status

To create a new branch called "Develop" -which serves as a branch for all integration.

git branch Develop

git checkout Develop   -m "moved to the new branch created"

Information:

ETL Feature Branches

I created a Feature branch for new features/script addition. Each feature branch files was pushed to the Develop branch intermittently as new chnages are being made, reviewed, with comments and a pull request to the main branch. The Develop branch was then priodically merged to the main branch.

Conclusion

This exercise has deepened my knowledge of Git and GitHub integration, as well as the practical use of Git commands. I learned how to:

-- Create and manage branches.

-- Push changes from the local repository to GitHub.

-- Create pull requests for feature branches.

-- Merge feature branches into the main branch.

It also reinforced the importance of avoiding direct merges into the main branch to prevent conflicts, and highlighted the benefits of code reviews and collaboration with other developers.

This is truly superb!

Shout Out:

I wouldn't believe I would do this. All thanks to Tutor Aviator for his teachings and patience.And to the DEC-Launchpad managers. I appreciate you.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages