Skip to content

Added new example for Anthropic HH / Bias EK-FAC score computation for Pythia models (base and SFTed)#46

Merged
pomonam merged 6 commits intopomonam:mainfrom
sevendaystoglory:main
Jun 18, 2025
Merged

Added new example for Anthropic HH / Bias EK-FAC score computation for Pythia models (base and SFTed)#46
pomonam merged 6 commits intopomonam:mainfrom
sevendaystoglory:main

Conversation

@sevendaystoglory
Copy link
Contributor

This folder shows one minimal, self-contained example of how to use kronfluence to:

  1. Fit EKFAC influence factors on a large-language model (the 410 M parameter Pythia model) using a subset of 10 k Anthropic-HH training samples.
  2. Compute pairwise influence scores between that training set and the "Stereotypical Bias" evaluation set.

The goal of the example is to be copy-paste simple: after installing the requirements, a single command will produce both the factors and the influence scores.

Layout

AnthropicHH-Bias/
 ├─ fit_all_factors.py         # Step-1 · compute EKFAC factors
 ├─ compute_pairwise_scores.py # Step-2 · compute pairwise scores
 ├─ task.py                    # Loss + measurement definitions
 ├─ utils.py                   # Helper functions (dataset loading, metrics…)
 ├─ SFT_Trainer_Lora.py        # Optional LoRA fine-tuning script
 ├─ requirements.txt           # Extra runtime deps (torch + transformers …)
 └─ README.md                 

@pomonam pomonam self-requested a review June 18, 2025 04:50
@pomonam
Copy link
Owner

pomonam commented Jun 18, 2025

Thank you! This looks great! Would it be easy to fix some ruff / lining issues? I'm happy to merge this.

@pomonam pomonam merged commit da63b88 into pomonam:main Jun 18, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants