All changes we make to the assignment code or PDF will be documented in this file.
- handout: added guidance on size of D_b in expert iteration experiment
- code: add data for optional assignment
- code: for optional assignment, update alpaca_eval to Llama 3.3 70B Instruct judge
- handout: add optional assignment on safety, instruction tuning, and RLHF
- code: change masked normalize constant to not equal seqlen, more SFT test coverage
- code: 2025 assignment on SFT, Expert Iteration, and GRPO with verified rewards on MATH
- handout: 2025 assignment on SFT, Expert Iteration, and GRPO with verified rewards on MATH
- N/A
- N/A
- code: fix AlpacaEval auto-evaluator to use local Llama 3 70B instruct.
- code: add missing
evaluate_safety.pyscript toscripts - code: add Llama 3 tokenizer as a fixture.
- code: fix DPO loss test
- code: make SFT dataset test stricter by comparing against expected output to help folks catch bugs.
- code: include prompts as text files in
cs336_alignment/prompts - handout: fix typo in code example for writing AlpacaEval outputs.
- handout: provide more instructions on interpreting AlpacaEval annotations file.
- handout: give better default DPO hyperparameters
- handout: clarify prompt to use for the DPO loss (AlpacaEval prompt) and mention EOS token
- handout: clarify that arrows in the prompts are line continuations, not line breaks
- handout: mention that we provide the prompts as text files at
cs336_alignment/prompts
- code: add MMLU, GSM8K, AlpacaEval, and SimpleSafetyTests data to
./data.
- handout: explicitly set CUDA_HOME in FlashAttention-2 installation instructions.
- code: explicitly set CUDA_HOME in FlashAttention-2 installation instructions.
Initial release