Skip to content

Comments

Add parametrized Transformers smoke test for tokenizer robustness#1814

Merged
RobinPicard merged 2 commits intodottxt-ai:mainfrom
kudos07:test/parametrized-steerable-clean
Feb 2, 2026
Merged

Add parametrized Transformers smoke test for tokenizer robustness#1814
RobinPicard merged 2 commits intodottxt-ai:mainfrom
kudos07:test/parametrized-steerable-clean

Conversation

@kudos07
Copy link
Contributor

@kudos07 kudos07 commented Jan 26, 2026

This PR replays the parametrized Transformers smoke test cleanly on top of main.

It keeps the test in test_transformers.py, avoids any conftest.py changes, and exercises tokenizer differences using TEST_MODEL plus a small secondary model.

Supersedes #1774.

Copy link
Contributor

@RobinPicard RobinPicard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Thanks for your perseverance!


hf_tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)

if hf_tokenizer.pad_token is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's the purpose of this test to fail if something is wrong in that domain. If adjustments like that must be made for some tokenizers/models, it should live in the library

prompt,
constraint,
max_new_tokens=5,
do_sample=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to loop if we then use do_sample=False

@kudos07 kudos07 force-pushed the test/parametrized-steerable-clean branch from d46d799 to 48c8a38 Compare January 28, 2026 09:14
@kudos07
Copy link
Contributor Author

kudos07 commented Jan 28, 2026

Thanks for the clarification @RobinPicard - I’ve removed the tokenizer padding adjustment from the test and simplified it to a single deterministic run. Pushed an update.

@RobinPicard RobinPicard force-pushed the test/parametrized-steerable-clean branch from 48c8a38 to 5abef08 Compare January 28, 2026 09:19
@kudos07 kudos07 force-pushed the test/parametrized-steerable-clean branch from daa8f6e to 3d2dcc5 Compare January 31, 2026 03:16
@kudos07 kudos07 requested a review from RobinPicard February 2, 2026 08:03
@RobinPicard RobinPicard merged commit d820bc0 into dottxt-ai:main Feb 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants