Skip to content

Latest commit

 

History

History
57 lines (38 loc) · 2.43 KB

File metadata and controls

57 lines (38 loc) · 2.43 KB

Solutions

GSM8K Baseline

Baseline evaluation of Qwen 2.5 1.5B model on GSM8K dataset

SFT Helper

SFT Experiment

Expert Iteration

GRPO

CS336 Spring 2025 Assignment 5: Alignment

For a full description of the assignment, see the assignment handout at cs336_spring2025_assignment5_alignment.pdf

We include a supplemental (and completely optional) assignment on safety alignment, instruction tuning, and RLHF at cs336_spring2025_assignment5_supplement_safety_rlhf.pdf

If you see any issues with the assignment handout or code, please feel free to raise a GitHub issue or open a pull request with a fix.

Setup

As in previous assignments, we use uv to manage dependencies.

  1. Install all packages except flash-attn, then all packages (flash-attn is weird)
uv sync --no-install-package flash-attn
uv sync
  1. Run unit tests:
uv run pytest

Initially, all tests should fail with NotImplementedErrors. To connect your implementation to the tests, complete the functions in ./tests/adapters.py.