A curated, productionβready portfolio of PyTorch projects you can run, learn from, and showcase. Each project is small, focused, and follows the same structure so recruiters/teammates can navigate quickly.
β Tip: Clone and run any project in minutes. Every folder has a
README, configs, scripts, and a short report.
- Core: Python, PyTorch, TorchVision, TorchAudio, TorchText
- Training: Lightning (optional), Hydra configs, mixed precision
- Tracking: TensorBoard & Weights & Biases (optional)
- Quality: preβcommit, black, isort, ruff, pytest
- CI: GitHub Actions (unit tests, style checks)
.
βββ projects/
β βββ 01_tabular-regression_boston/
β βββ 02_tabular-classification_bank-marketing/
β βββ 03_cnn-mnist/
β βββ 04_cnn-cifar10/
β βββ 05_transfer-learning_resnet_imagenette/
β βββ 06_nlp-sentiment_lstm_imdb/
β βββ 07_nlp-text-classification_bert_agnews/
β βββ 08_time-series_lstm_air-passengers/
β βββ 09_recommendation_matrix-factorization_movielens/
β βββ 10_segmentation_unet_carvana/
β βββ 11_object-detection_fasterrcnn_pennfudan/
β βββ 12_gans_dcgan_mnist/
β βββ 13_rl_dqn_cartpole/ # (bonus: shows breadth)
β
βββ templates/ # reusable code (datasets, training loops, utils)
β βββ dataset_template.py
β βββ model_template.py
β βββ train_template.py
β βββ evaluate_template.py
β
βββ tools/
β βββ make_dataset.py
β βββ train.py
β βββ evaluate.py
β βββ infer.py
β
βββ configs/ # hydra configs (defaults + perβproject overrides)
β βββ default.yaml
β βββ <project>.yaml
β
βββ tests/
β βββ test_smoke.py
β
βββ requirements.txt
βββ environment.yml
βββ pyproject.toml # black/isort/ruff config
βββ .pre-commit-config.yaml
βββ .github/workflows/ci.yml
βββ README.md (this file)
-
Tabular Regression β Boston Housing Goal: supervised regression, feature scaling, MAE/RΒ², early stopping. Folder:
projects/01_tabular-regression_bostonMetrics: MAE, RMSE, RΒ² -
Tabular Classification β Bank Marketing Goal: class imbalance, ROCβAUC/PRβAUC, calibration. Folder:
projects/02_tabular-classification_bank-marketing -
CNN Basics β MNIST Goal: Conv/Pool/Dropout, training loop anatomy. Folder:
projects/03_cnn-mnist -
CNN + Data Augment β CIFARβ10 Goal: augmentations, LR schedules, CutOut/MixUp (optional). Folder:
projects/04_cnn-cifar10 -
Transfer Learning β ResNet on Imagenette Goal: fineβtuning strategies, layer freezing, discriminative LRs. Folder:
projects/05_transfer-learning_resnet_imagenette -
NLP Sentiment β LSTM on IMDB Goal: tokenization, padding, packed sequences, embeddings. Folder:
projects/06_nlp-sentiment_lstm_imdb -
NLP Text Classification β BERT on AG News Goal: transformers, attention masks, gradient clipping. Folder:
projects/07_nlp-text-classification_bert_agnews -
Time Series Forecasting β LSTM (Air Passengers) Goal: sliding windows, scaling, MAPE/SMAPE. Folder:
projects/08_time-series_lstm_air-passengers -
Recommender β Matrix Factorization (MovieLens 100K) Goal: implicit vs explicit feedback, RMSE & NDCG@K. Folder:
projects/09_recommendation_matrix-factorization_movielens -
Image Segmentation β UβNet (Carvana or Oxford Pets) Goal: dice loss, IoU, augmentation for masks. Folder:
projects/10_segmentation_unet_carvana -
Object Detection β Faster RβCNN (PennβFudan) Goal: custom collate, anchors, mAP. Folder:
projects/11_object-detection_fasterrcnn_pennfudan -
Generative β DCGAN (MNIST) Goal: adversarial training, FID (optional). Folder:
projects/12_gans_dcgan_mnist -
Reinforcement Learning β DQN (CartPole) Goal: replay buffer, target network, Ξ΅βgreedy. Folder:
projects/13_rl_dqn_cartpole
You can start with 3β5 projects and grow over time. Commit early; keep results and a short writeβup in each folder.
# Clone
git clone https://github.com/<your-username>/pytorch-portfolio.git
cd pytorch-portfolio
# Create environment (conda)
conda env create -f environment.yml
conda activate torch-portfolio
# Or pip
python -m venv .venv && source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install -r requirements.txt
# Pre-commit hooks (format + lint)
pre-commit install
# Run a project (example: CIFAR-10)
python tools/train.py project=04_cnn-cifar10 trainer.max_epochs=20
# View logs
tensorboard --logdir runsAll projects share a single training entrypoint (tools/train.py) with a Hydra config. Each project has its own YAML that overrides defaults.
Example:
python tools/train.py project=03_cnn-mnist trainer.max_epochs=10 optimizer.lr=1e-3Default config snippet (configs/default.yaml):
seed: 42
project: 03_cnn-mnist
trainer:
max_epochs: 10
mixed_precision: true
batch_size: 64
optimizer:
name: adam
lr: 0.001
weight_decay: 0.0
model:
name: simple_cnn
hidden_dim: 128
logging:
use_wandb: false
log_dir: runs# templates/train_template.py
import torch, time
from torch.utils.data import DataLoader
def train(model, train_ds, val_ds, loss_fn, optimizer, epochs=10, batch_size=64, device="cuda"):
train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True, num_workers=2)
val_loader = DataLoader(val_ds, batch_size=batch_size, shuffle=False, num_workers=2)
model.to(device)
best_val = float("inf")
for epoch in range(1, epochs+1):
model.train(); t0 = time.time(); running = 0.0
for x, y in train_loader:
x, y = x.to(device), y.to(device)
optimizer.zero_grad()
pred = model(x)
loss = loss_fn(pred, y)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
running += loss.item() * x.size(0)
train_loss = running / len(train_loader.dataset)
# validation
model.eval(); val_loss = 0.0
with torch.no_grad():
for x, y in val_loader:
x, y = x.to(device), y.to(device)
pred = model(x)
val_loss += loss_fn(pred, y).item() * x.size(0)
val_loss /= len(val_loader.dataset)
print(f"Epoch {epoch:02d} | train {train_loss:.4f} | val {val_loss:.4f} | {time.time()-t0:.1f}s")
if val_loss < best_val:
best_val = val_loss
torch.save(model.state_dict(), "best.pt")Each project folder includes:
README.mdwith dataset, model, how to run, and learning notesmetrics.json(MAE/RMSE/Acc/F1/mAP etc.)figures/(loss curves, confusion matrix, sample preds)model_card.md(short description & limitations)
Short report template:
# Results
- Train/Val/Test metrics: ...
- Best checkpoint: ...
- Inference speed: ...
# What I learned
- β¦ three bullets maximum β¦
# Next steps
- β¦ improvements to try β¦pytest -qruns smoke tests on tiny batches/dummy datapre-commitautoβformats code and checks style on every commit
tests/test_smoke.py idea:
import torch
def test_torch_works():
assert torch.cuda.is_available() or True # allow CPU onlyrequirements.txt (minimal):
torch
torchvision
torchaudio
torchtext
numpy
pandas
matplotlib
scikit-learn
torchmetrics
tensorboard
hydra-core
PyYAML
rich
pre-commit
black
isort
ruff
pytest
environment.yml (conda):
name: torch-portfolio
channels: [pytorch, conda-forge]
dependencies:
- python=3.11
- pytorch
- torchvision
- torchaudio
- cudatoolkit # or pytorch-cuda=12.1 on Windows
- pip
- pip:
- torchtext
- numpy
- pandas
- matplotlib
- scikit-learn
- torchmetrics
- tensorboard
- hydra-core
- PyYAML
- rich
- pre-commit
- black
- isort
- ruff
- pytest- Add Lightning versions (sideβbyβside with pure PyTorch)
- Export ONNX + TorchScript for a couple of models
- Dockerfiles per project +
Makefile - Hugging Face Spaces demo for 1β2 projects
- Add
inference/notebooks with reproducible examples
- Feel free to fork and use as a template for your own learning.
- Keep each project selfβcontained, small, and wellβdocumented.
- Prefer readable code over clever tricks.
Add 2β3 images per project in figures/ (loss curves, confusion matrix, sample predictions). These make your README scannable for recruiters.
Choose MIT or Apacheβ2.0 for simple reuse.



- Duplicate a folder from
projects/03_cnn-mnistβ rename. - Create
README.mdwith dataset, model, how to run, metrics. - Add Hydra config under
configs/<project>.yaml. - Implement
dataset.py,model.py, andtrain.py(or reuse templates). - Log metrics to TensorBoard; save best checkpoint.
- Add 2β3 figures and a
model_card.md. - Add a tiny smoke test in
tests/. - Update the Project Index above.
- MNIST / CIFARβ10 / Imagenette via
torchvision.datasets - IMDB / AG News via
torchtext.datasets - MovieLens 100K: lightweight; script in
tools/make_dataset.py - PennβFudan: small detection dataset; download script provided
- Air Passengers: CSV in repo for deterministic runs
# MNIST
python tools/train.py project=03_cnn-mnist trainer.max_epochs=5 optimizer.lr=5e-4
# CIFAR-10 with augmentations and cosine LR
python tools/train.py project=04_cnn-cifar10 data.augment=true optimizer.lr=0.1 scheduler=cosine
# BERT text classification (AG News)
python tools/train.py project=07_nlp-text-classification_bert_agnews trainer.max_epochs=3 optimizer.lr=2e-5
# Recommender on MovieLens
python tools/train.py project=09_recommendation_matrix-factorization_movielens trainer.max_epochs=10Happy training! Keep commits atomic and document what you learned in each project. π