Add parametrized steerable model tests (Transformers backend) by kudos07 · Pull Request #1774 · dottxt-ai/outlines

kudos07 · 2025-11-01T07:57:37Z

Summary

This draft PR introduces a parametrized test suite for steerable models
covering multiple Hugging Face models under the Transformers backend.

It focuses on one aspect of issue #1717:

Expands initialization and inference coverage to multiple models.
Runs each inference several times to surface flaky behavior.
Adds a shared Hugging Face cache fixture to prevent rate-limit errors.
Includes a backend parameter to allow future vLLM/llama_cpp coverage.

Notes

vLLM tests are skipped on Windows; CI can re-enable them.
This is a draft PR as discussed with @RobinPicard to align on test organization before expanding further.

RobinPicard · 2025-11-10T09:25:25Z

Thanks for opening a PR @kudos07! I think maybe it would make sense to keep those test separated by Model as in your commit the code of the test is made for Transformers for instance. What I think we could do is to have in the test file for each Model, a specific model used for all the tests testing the various features and then a parametrized test with many models as the one you have here that would initialize and run a standard constrained generation.

To me one of the main interest of that would be to check that we do run into tokenizer issues for various models as I've noticed it's often where there are problems.

…er check

kudos07 · 2025-11-13T03:08:24Z

Update based on review feedback:

Moved the parametrized test into the Transformers backend test suite at tests/models/test_transformers_parametrized.py, since this test targets the Transformers model implementation specifically.
Kept the parametrized set of Hugging Face checkpoints to validate initialization and basic constrained generation across multiple tokenizers.
Added a small tokenizer sanity check (assert tokenizer.vocab_size > 0) to address the note that tokenizer inconsistencies often cause failures.
Left the vLLM entry in the parametrization but skipped on Windows, documenting intended future backend coverage once platform support allows it.

Let me know if you'd prefer this test split differently or want it integrated with any existing Transformers test files.

lmstudio requires Python >= 3.10, added constraint to optional dep.

kudos07 · 2026-01-10T04:22:34Z

Hi @RobinPicard

I’ve pushed an update addressing the feedback from #1774 .

What changed

Moved the parametrized test into the Transformers backend test suite at tests/models/test_transformers_parametrized.py, since it targets the Transformers implementation specifically.
Kept feature-level coverage in test_transformers.py scoped to a single canonical model, while adding a parametrized smoke test across multiple Hugging Face checkpoints.
The parametrized test intentionally focuses on initialization + one standard constrained generation, with the goal of surfacing tokenizer-related issues across models (e.g. padding / special-token handling and constrained decoding behavior), which has often been a source of failures.
Fixed padding consistently at the HF layer (fallback to eos_token) before wrapping with Outlines to make tokenizer incompatibilities explicit.
Added a session-scoped Hugging Face cache in tests/conftest.py to avoid repeated downloads and rate-limit issues in CI.
Limited the model set to small checkpoints to keep CI runtime reasonable.

This keeps the scope tight while directly targeting tokenizer robustness across different models, as discussed.
Happy to adjust the split or integrate this differently if you’d prefer before expanding coverage further.

kudos07 · 2026-01-20T07:56:29Z

Hi @RobinPicard, whenever you have a moment, please take a look. If any further changes are needed, I’m happy to do the needful!

RobinPicard · 2026-01-20T15:02:48Z

Can you rebase your branch on main and squash the commits please? Otherwise it's fine, just 2 things I would change

put the parametrized tests in the existing test_transformers.py file
remove what was added to conftest since there's already model caching in place

kudos07 · 2026-01-23T03:58:06Z

Hello @RobinPicard,

I’ve updated the branch to move the tests into test_transformers.py and removed the conftest changes.

A couple of quick clarifications to finalize the structure:

Model Coverage: I’m currently using two models (TEST_MODEL and pythia-70m-deduped) to catch tokenizer-specific edge cases across architectures. Do you prefer keeping this variety, or should I stick to just one?
Code Style: For the second model, would you prefer I define a new global variable at the top (e.g., TEST_MODEL_SECONDARY), or is in lining the string directly in the parametrize decorator okay with you?

Let me know and I'll adjust accordingly

RobinPicard · 2026-01-23T09:33:35Z

I’m currently using two models (TEST_MODEL and pythia-70m-deduped) to catch tokenizer-specific edge cases across architectures. Do you prefer keeping this variety, or should I stick to just one?

I think it would make sense to have a main TEST_MODEL used for all tests + a matrix of more varied models for the parametrized test

For the second model, would you prefer I define a new global variable at the top (e.g., TEST_MODEL_SECONDARY), or is in lining the string directly in the parametrize decorator okay with you?

In line for the parametrized decorator is fine

…teerable

kudos07 · 2026-01-25T00:41:43Z

Thanks for the feedback @RobinPicard!

I’ve moved the parametrized smoke test into test_transformers.py, removed the extra conftest changes, and kept the parametrization limited to TEST_MODEL plus one additional small model (pythia-70m-deduped) to exercise tokenizer differences.

All tests are passing locally.
I’m relatively new to contributing here, so apologies if there was any noise in earlier iterations - really appreciate the guidance.

Happy to adjust anything further if needed.

RobinPicard · 2026-01-26T07:51:26Z

No worries at all @kudos07! You just need to rebase your branch on main now and we'll be ready to merge

kudos07 · 2026-01-26T09:07:31Z

Hi @RobinPicard — really sorry for the churn and noise earlier. This is my first time navigating a larger rebase / history cleanup here, and I appreciate your patience and guidance throughout.

I’ve now replayed the change cleanly on top of main in this PR:
#1814

The parametrized test lives in test_transformers.py, and there are no conftest.py changes.

Thanks again for the help, and apologies for the back-and-forth.

kudos07 · 2026-02-02T09:08:39Z

Closing this draft PR since the work has been replayed cleanly and merged in #1814. Thanks again @RobinPicard for the guidance and review.

Add parametrized steerable model tests (Transformers backend)

153239c

kudos07 added 2 commits November 12, 2025 18:18

Restore parametrized steerable model test

2b9557a

Move parametrized Transformer test under tests/models and add tokeniz…

2405391

…er check

kudos07 and others added 22 commits January 1, 2026 19:44

Merge branch 'main' into test/parametrized-steerable

a17b7c9

Fix bug in JsonSchema resulting in the schema validation not running

5fd9108

Update the is_int_instance to exclude booleans

bab6e67

Have the _handle_list function gives an explicit error if no args

509dc56

Initial lm studio integration

d3cf4e7

Use lmstudio client and asynclient imports

b07325d

Add dependencies to pyptoject and remove learning/

f392f56

Fix lmstudio Python version constraint and regenerate uv.lock

2af72cd

lmstudio requires Python >= 3.10, added constraint to optional dep.

Skip lmstudio during cicd

1e6fb08

Add python version constraint to lmstudio test dependency

62815f7

add lmstudio to omited models

ecb3ff9

Follow ollama test pattern

4873ad3

Reset uv.lock to main (maintainers should regenerate with lmstudio)

30384b6

Add multimodal support

7080f2c

Add .worktrees to gitignore

9f6b535

Add validation and tests

efdf26a

Align tests

5ca397e

Mock tests for LMStudio

624cd32

Update the uv.lock to add lmstudio

75e978b

Skip the LMStudio vision tests when mocking

8dac88b

Refine Transformers parametrized smoke test and HF cache

007d7eb

Merge branch 'main' into test/parametrized-steerable

a3847b2

kudos07 added 2 commits January 24, 2026 16:04

Merge remote-tracking branch 'upstream/main' into test/parametrized-s…

da31cae

…teerable

Add parametrized Transformers smoke test for tokenizer robustness

f9f0344

kudos07 mentioned this pull request Jan 26, 2026

Add parametrized Transformers smoke test for tokenizer robustness #1814

Merged

kudos07 closed this Jan 27, 2026

kudos07 reopened this Jan 27, 2026

kudos07 closed this Feb 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add parametrized steerable model tests (Transformers backend)#1774

Add parametrized steerable model tests (Transformers backend)#1774
kudos07 wants to merge 27 commits intodottxt-ai:mainfrom
kudos07:test/parametrized-steerable

kudos07 commented Nov 1, 2025

Uh oh!

RobinPicard commented Nov 10, 2025

Uh oh!

kudos07 commented Nov 13, 2025

Uh oh!

kudos07 commented Jan 10, 2026

Uh oh!

kudos07 commented Jan 20, 2026

Uh oh!

RobinPicard commented Jan 20, 2026

Uh oh!

kudos07 commented Jan 23, 2026

Uh oh!

RobinPicard commented Jan 23, 2026

Uh oh!

kudos07 commented Jan 25, 2026

Uh oh!

RobinPicard commented Jan 26, 2026

Uh oh!

kudos07 commented Jan 26, 2026

Uh oh!

kudos07 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

kudos07 commented Nov 1, 2025

Summary

Notes

Uh oh!

RobinPicard commented Nov 10, 2025

Uh oh!

kudos07 commented Nov 13, 2025

Uh oh!

kudos07 commented Jan 10, 2026

What changed

Uh oh!

kudos07 commented Jan 20, 2026

Uh oh!

RobinPicard commented Jan 20, 2026

Uh oh!

kudos07 commented Jan 23, 2026

Uh oh!

RobinPicard commented Jan 23, 2026

Uh oh!

kudos07 commented Jan 25, 2026

Uh oh!

RobinPicard commented Jan 26, 2026

Uh oh!

kudos07 commented Jan 26, 2026

Uh oh!

kudos07 commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants