Skip to content

Comments

Add parametrized steerable model tests (Transformers backend)#1774

Closed
kudos07 wants to merge 27 commits intodottxt-ai:mainfrom
kudos07:test/parametrized-steerable
Closed

Add parametrized steerable model tests (Transformers backend)#1774
kudos07 wants to merge 27 commits intodottxt-ai:mainfrom
kudos07:test/parametrized-steerable

Conversation

@kudos07
Copy link
Contributor

@kudos07 kudos07 commented Nov 1, 2025

Summary

This draft PR introduces a parametrized test suite for steerable models
covering multiple Hugging Face models under the Transformers backend.

It focuses on one aspect of issue #1717:

  • Expands initialization and inference coverage to multiple models.
  • Runs each inference several times to surface flaky behavior.
  • Adds a shared Hugging Face cache fixture to prevent rate-limit errors.
  • Includes a backend parameter to allow future vLLM/llama_cpp coverage.

Notes

  • vLLM tests are skipped on Windows; CI can re-enable them.
  • This is a draft PR as discussed with @RobinPicard to align on test organization before expanding further.

@RobinPicard
Copy link
Contributor

Thanks for opening a PR @kudos07! I think maybe it would make sense to keep those test separated by Model as in your commit the code of the test is made for Transformers for instance. What I think we could do is to have in the test file for each Model, a specific model used for all the tests testing the various features and then a parametrized test with many models as the one you have here that would initialize and run a standard constrained generation.

To me one of the main interest of that would be to check that we do run into tokenizer issues for various models as I've noticed it's often where there are problems.

@kudos07
Copy link
Contributor Author

kudos07 commented Nov 13, 2025

Update based on review feedback:

  1. Moved the parametrized test into the Transformers backend test suite at tests/models/test_transformers_parametrized.py, since this test targets the Transformers model implementation specifically.

  2. Kept the parametrized set of Hugging Face checkpoints to validate initialization and basic constrained generation across multiple tokenizers.

  3. Added a small tokenizer sanity check (assert tokenizer.vocab_size > 0) to address the note that tokenizer inconsistencies often cause failures.

  4. Left the vLLM entry in the parametrization but skipped on Windows, documenting intended future backend coverage once platform support allows it.

Let me know if you'd prefer this test split differently or want it integrated with any existing Transformers test files.

@kudos07
Copy link
Contributor Author

kudos07 commented Jan 10, 2026

Hi @RobinPicard

I’ve pushed an update addressing the feedback from #1774 .

What changed

  • Moved the parametrized test into the Transformers backend test suite at tests/models/test_transformers_parametrized.py, since it targets the Transformers implementation specifically.

  • Kept feature-level coverage in test_transformers.py scoped to a single canonical model, while adding a parametrized smoke test across multiple Hugging Face checkpoints.

  • The parametrized test intentionally focuses on initialization + one standard constrained generation, with the goal of surfacing tokenizer-related issues across models (e.g. padding / special-token handling and constrained decoding behavior), which has often been a source of failures.

  • Fixed padding consistently at the HF layer (fallback to eos_token) before wrapping with Outlines to make tokenizer incompatibilities explicit.

  • Added a session-scoped Hugging Face cache in tests/conftest.py to avoid repeated downloads and rate-limit issues in CI.

  • Limited the model set to small checkpoints to keep CI runtime reasonable.

This keeps the scope tight while directly targeting tokenizer robustness across different models, as discussed.
Happy to adjust the split or integrate this differently if you’d prefer before expanding coverage further.

@kudos07
Copy link
Contributor Author

kudos07 commented Jan 20, 2026

Hi @RobinPicard, whenever you have a moment, please take a look. If any further changes are needed, I’m happy to do the needful!

@RobinPicard
Copy link
Contributor

Can you rebase your branch on main and squash the commits please? Otherwise it's fine, just 2 things I would change

  • put the parametrized tests in the existing test_transformers.py file
  • remove what was added to conftest since there's already model caching in place

@kudos07
Copy link
Contributor Author

kudos07 commented Jan 23, 2026

Hello @RobinPicard,

I’ve updated the branch to move the tests into test_transformers.py and removed the conftest changes.

A couple of quick clarifications to finalize the structure:

  • Model Coverage: I’m currently using two models (TEST_MODEL and pythia-70m-deduped) to catch tokenizer-specific edge cases across architectures. Do you prefer keeping this variety, or should I stick to just one?

  • Code Style: For the second model, would you prefer I define a new global variable at the top (e.g., TEST_MODEL_SECONDARY), or is in lining the string directly in the parametrize decorator okay with you?

Let me know and I'll adjust accordingly

@RobinPicard
Copy link
Contributor

I’m currently using two models (TEST_MODEL and pythia-70m-deduped) to catch tokenizer-specific edge cases across architectures. Do you prefer keeping this variety, or should I stick to just one?

I think it would make sense to have a main TEST_MODEL used for all tests + a matrix of more varied models for the parametrized test

For the second model, would you prefer I define a new global variable at the top (e.g., TEST_MODEL_SECONDARY), or is in lining the string directly in the parametrize decorator okay with you?

In line for the parametrized decorator is fine

@kudos07
Copy link
Contributor Author

kudos07 commented Jan 25, 2026

Thanks for the feedback @RobinPicard!

I’ve moved the parametrized smoke test into test_transformers.py, removed the extra conftest changes, and kept the parametrization limited to TEST_MODEL plus one additional small model (pythia-70m-deduped) to exercise tokenizer differences.

All tests are passing locally.
I’m relatively new to contributing here, so apologies if there was any noise in earlier iterations - really appreciate the guidance.

Happy to adjust anything further if needed.

@RobinPicard
Copy link
Contributor

No worries at all @kudos07! You just need to rebase your branch on main now and we'll be ready to merge

@kudos07
Copy link
Contributor Author

kudos07 commented Jan 26, 2026

Hi @RobinPicard — really sorry for the churn and noise earlier. This is my first time navigating a larger rebase / history cleanup here, and I appreciate your patience and guidance throughout.

I’ve now replayed the change cleanly on top of main in this PR:
#1814

The parametrized test lives in test_transformers.py, and there are no conftest.py changes.

Thanks again for the help, and apologies for the back-and-forth.

@kudos07 kudos07 closed this Jan 27, 2026
@kudos07 kudos07 reopened this Jan 27, 2026
@kudos07
Copy link
Contributor Author

kudos07 commented Feb 2, 2026

Closing this draft PR since the work has been replayed cleanly and merged in #1814. Thanks again @RobinPicard for the guidance and review.

@kudos07 kudos07 closed this Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants