Skip to content

Issue 39 support othellogpt in SAELens#317

Merged
chanind merged 29 commits intodecoderesearch:mainfrom
decandido:issues/issue-39-support-othellogpt
Oct 15, 2024
Merged

Issue 39 support othellogpt in SAELens#317
chanind merged 29 commits intodecoderesearch:mainfrom
decandido:issues/issue-39-support-othellogpt

Conversation

@decandido
Copy link
Copy Markdown
Contributor

@decandido decandido commented Oct 3, 2024

Description

Adding support for OthelloGPT. Adding logic to train on sequences with a start and end position offset similar to PR 294. Adding tests for the added logic and to benchmark the OthelloGPT SAETrainingRunner.

Fixes # Issue 39

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

  • I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

@jbloomAus
Copy link
Copy Markdown
Contributor

Thanks @decandido and @zhenningdavidliu!

This has some overlap with #294. Specifically, I'd like to use the seqpos slicing instead of the start and end pos offsets. Other than that this looks good I think!

Requests:

  • [] Please rebase or merge with Support seqpos slicing #294 (which will also make sure the seqpos slice ends up in the config)
  • [] Optional: Do you want to make a short tutorial on training / using the OthelloGPT SAE?

Once that's done I'll accept :)

Copy link
Copy Markdown
Collaborator

@chanind chanind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I userstand this correctly, these offset parameters only makes sense in cases where the model is given sequences of a fixed length, and context_size is set to that length, e.g. OthelloGPT?


stacked_activations = torch.zeros((n_batches, n_context, 1, self.d_in))
# For some models, we might want to exclude some positions from the sequence to train on
training_context_slice = list(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inefficient to create a list of indices like this. It's better to create an actual slice object, like training_context_slice = slice(self.start_pos_offset, n_context - self.end_pos_offset). Alternative, you can probably just do something like the following directly when selecting the specific indices.

stacked_activations[:, :, 0] = layerwise_activations[self.hook_name][
      :, start_pos:end_pos, self.hook_head_index
]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David, thanks for your thorough review of our PR! As Joseph mentioned in his comment above, we've rebased on Callum's PR and added a few tests of our own. These problems are not present in the rebased code, but they are really helpful for @zhenningdavidliu and me! 🙏

@zhenningdavidliu zhenningdavidliu force-pushed the issues/issue-39-support-othellogpt branch from 7c333e8 to 1b39d82 Compare October 4, 2024 10:32
@decandido decandido marked this pull request as draft October 7, 2024 08:01
callummcdougall and others added 26 commits October 9, 2024 14:40
@decandido decandido force-pushed the issues/issue-39-support-othellogpt branch from 8539145 to 9130ff9 Compare October 9, 2024 12:41
@decandido
Copy link
Copy Markdown
Contributor Author

Requests:

  • [] Please rebase or merge with Support seqpos slicing #294 (which will also make sure the seqpos slice ends up in the config)
  • [] Optional: Do you want to make a short tutorial on training / using the OthelloGPT SAE?

Thanks @jbloomAus for reviewing our PR! We've rebased on #294 and removed the start and end pos offsets.

We also added a short script to train SAEs on othelloGPT (scripts/training_a_sparse_autoencoder_othelloGPT.py). Let us know if that's what you were looking for!

@decandido decandido marked this pull request as ready for review October 9, 2024 13:22
@callummcdougall
Copy link
Copy Markdown
Contributor

Hey, thanks for all your awesome work here! Could I possibly bump on this PR? Just cause it would also be super useful for the final bits of the ARENA material to run smoothly (-:

Copy link
Copy Markdown
Collaborator

@chanind chanind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Will leave it to Joseph to merge

@callummcdougall
Copy link
Copy Markdown
Contributor

Thanks! (I'm actually also realizing that the context length issue wasn't a part of this PR, it was an earlier one, and that was the main thing that was making things a bit clunky, so less important thank I thought - but would still be great!)

@chanind chanind merged commit 7047f87 into decoderesearch:main Oct 15, 2024
xXCoolinXx pushed a commit to xXCoolinXx/SAELens that referenced this pull request Feb 23, 2026
* support seqpos slicing

* add basic tests, ensure it's in the SAE config

* format

* fix tests

* fix tests 2

* fix: Changing the activations store to handle context sizes smaller than dataset lengths for tokenized datasets.

* fix: Found bug which allowed for negative context lengths. Removed the bug

* Update pytest to test new logic for context size of tokenized dataset

* Reformat code to pass CI tests

* Add warning for when context_size is smaller than the dataset context_size

* feat: adding support for start and end position offsets for token sequences

* Add start_pos_offset and end_pos_offset to the SAERunnerConfig

* Add tests for start_pos_offset and end_pos_offset in the LanguageModelSAERunnerConfig

* feat: start and end position offset support for SAELens.

* Add test for CacheActivationsRunnerConfig with start and end pos offset

* Test cache activation runner wtih valid start and end pos offset

* feat: Enabling loading of start and end pos offset from saes. Adding
tests for this

* fix: Renaming variables and a test

* adds test for position offests for saes

* reformats files with black

* Add start and end pos offset to the base sae dict

* fix test for sae training runner config with position offsets

* add a benchmark test to train an SAE on OthelloGPT

* Remove double import from typing

* change dead_feature_window to int

* remove print statements from test file

* Rebase on seqpos tuple implementation and remove start/end pos offset

* Reword docstring for seqpos to be clearer.

* Added script to train an SAE on othelloGPT

---------

Co-authored-by: callummcdougall <cal.s.mcdougall@gmail.com>
Co-authored-by: jbloomAus <jbloomaus@gmail.com>
Co-authored-by: liuman <zhenninghimme@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants