Welcome! This repo contains everything you need for Week 1 of the Data Engineering program: environment setup, Bash/VS Code/Jupyter intros, Git + GitHub workflows, Python practice (basics → intermediate → advanced), and a Pandas primer.
Tip: Keep commits small (e.g., one per exercise). When a notebook’s tests pass, Restart Kernel → Run All to confirm a clean run.
- Clone the repo
git clone <your-fork-or-classroom-url>.git
cd WEEK1-PYTHON-FOR-DATA-ENGINEERING- Create a project virtual environment (recommended: Python 3.11.3)
-
Linux / macOS
python -m venv .venv source .venv/bin/activate -
Windows (PowerShell)
python -m venv .venv .\.venv\Scripts\Activate.ps1
- (Optional) Jupyter & data stack for local runs
pip install notebook jupyterlab pandas numpy matplotlib seaborn- Launch notebooks
jupyter lab
# or
jupyter notebookIn VS Code, open the folder and run: Command Palette → “Python: Select Interpreter” → pick
.venv.
- Tooling: Bash, VS Code, Jupyter, Google Colab
- Version control: Git + GitHub (clone, branch, commit, PR)
- Python: Core syntax & flow, data structures, functions, error handling, iterators/generators, performance tips
- Pandas: Series/DataFrame basics, transforms, visualization
-
01_welcome/welcome.md— course kickoff and expectations.
-
02_installation_setup/[Time Allocation - 2nd half of Day 1]setup_for_linux/— Linux install guides (Bash, Docker, Git, Python/pyenv, VS Code, PostgreSQL/pgAdmin, Jupyter).setup_for_mac/— macOS install guides (Homebrew, Bash, Docker, Git, Python/pyenv, VS Code, PostgreSQL/pgAdmin, Jupyter).setup_for_windows/— Windows install guides (Git Bash/WSL notes, Docker Desktop, Git, Python/pyenv-win, VS Code, PostgreSQL/pgAdmin, Jupyter).vscode_venv/— how to use virtual environments with VS Code (Windows/macOS). Use these if your machine isn’t set up yet.
-
03_bash_jupyter_vscode_colab_intro/bash.md— Bash intro + practice game (Bandit).vscode.md— VS Code essentials for this course.jupyter.md— Jupyter Notebook/Lab walkthrough.colab.md— Using Google Colab.
-
04_git_github/git_github_intro.md— class workflow: fork/clone, feature branches, commits, PRs, resolving simple conflicts.
-
05_python_practice/-
python_basics/— notebooks + exercise folder. Topics:- Numeric variable types, Strings, If/Elif/Else, Loops
- Lists, Sets, Mutability, Dictionaries, Comprehensions
- Functions (intro/definitions/calling/challenge)
- Each student notebook has TODOs and
asserttests.
-
python_intermediate/— notebooks + exercise folder:- Error handling, Iterators & Generators, Lambda/Map/Filter/Reduce, Performance.
-
python_advanced/— notebooks + exercise folder:- OOP introduction, Concurrency & Parallelism.
-
-
06_pandas_intro/01_pandas.ipynb→ foundations (Series/DataFrame, indexing, I/O).02_pandas_practice_1.ipynb,04_pandas_practice_2.ipynb,05_pandas_practice_3.ipynb→ progressively harder practice.03_pandas_visualization.ipynb→ quick plotting.data/— sample CSVs/parquet used by the notebooks. Don’t move/rename.