Improve reporting of deserializer validation errors #550

dbutenhof · 2026-01-22T20:14:04Z

Summary

I was looking for a "dead simple" problem just to get my feet damp. Issue #205 stood out. (A comment on that issue includes some detailed analysis.)

Deserialization of some parameters like --data takes place during run rather than during Click option processing. Errors raise a validation exception, but to the Click CLI infrastructure this is an unexpected exception and causes a full traceback which is not generally useful to the user.

This change intercepts internal validation error exceptions and raises a Click BadParameter exception encapsulating the validation error text. This will be reported without traceback.

Details

This simply wraps the asyncio.run which starts benchmarking (and runs the deserializers) with a try block to convert the deserializer's ValueError into click.BadParameter so that Click can generate a better usage message and will suppress the traceback which can obscure the message.

dbutenho 14:54 badparam:fix/badparam guidellm mock-server --port 8004 &
dbutenho 11:02 badparam:fix/badparam guidellm benchmark --target http://localhost:8004/v1 --rate-type sweep --max-seconds 30 --model qwen3:4b --data "prompt_tokens=256,output_tokens=128"
Main |2026-01-22 11:03:07 -0500 ACCESS:   127.0.0.1:50704 GET http://localhost:8004/health                                                                                                                                200 50  0.1ms
✔ OpenAIHTTPBackend backend validated with model qwen3:4b
  {'target': 'http://localhost:8004', 'model': 'qwen3:4b', 'timeout': 60.0, 'http2': True, 'follow_redirects': True, 'verify': False, 'openai_paths': {'health': 'health', 'models': 'v1/models', 'text_completions': 'v1/completions',
  'chat_completions': 'v1/chat/completions', 'audio_transcriptions': 'v1/audio/transcriptions', 'audio_translations': 'v1/audio/translations'}, 'validate_backend': {'method': 'GET', 'url': 'http://localhost:8004/health'}}          
✔ Processor resolved
  Using model 'qwen3:4b' as processor                                                                                                                                                                                                  
Usage: guidellm benchmark run [OPTIONS]
Try 'guidellm benchmark run --help' for help.

Error: Invalid value: Data deserialization failed, likely because the input doesn't match any of the input formats. See the 15 error(s) that occurred while attempting to deserialize the data prompt_tokens=256,output_tokens=128:
  - Deserializer 'huggingface': (HFValidationError) Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: 'prompt_tokens=256,output_tokens=128'.
  - Deserializer 'synthetic_text': (HFValidationError) Repo id must use alphanumeric chars, '-', '_' or '.'. The name cannot start or end with '-' or '.' and the maximum length is 96: 'qwen3:4b'.
  - Deserializer 'arrow_file': (DataNotSupportedError) Unsupported data for ArrowFileDatasetDeserializer, expected str or Path to a local .arrow file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'csv_file': (DataNotSupportedError) Unsupported data for CSVFileDatasetDeserializer, expected str or Path to a valid local .csv file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'db_file': (DataNotSupportedError) Unsupported data for DBFileDatasetDeserializer, expected str or Path to a local .db file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'hdf5_file': (DataNotSupportedError) Unsupported data for HDF5FileDatasetDeserializer, expected str or Path to a local .hdf5 or .h5 file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'in_memory_csv_str': (DataNotSupportedError) Unsupported data for InMemoryCsvDatasetDeserializer, expected CSV string, got <class 'str'>
  - Deserializer 'in_memory_dict': (DataNotSupportedError) Unsupported data for InMemoryDictDatasetDeserializer, expected dict[str, list], got prompt_tokens=256,output_tokens=128
  - Deserializer 'in_memory_dict_list': (DataNotSupportedError) Unsupported data for InMemoryDictListDatasetDeserializer, expected list of dicts, got prompt_tokens=256,output_tokens=128
  - Deserializer 'in_memory_item_list': (DataNotSupportedError) Unsupported data for InMemoryItemListDatasetDeserializer, expected list of primitive items, got prompt_tokens=256,output_tokens=128
  - Deserializer 'in_memory_json_str': (DataNotSupportedError) Unsupported data for InMemoryJsonStrDatasetDeserializer, expected JSON string with a list or dict of items, got prompt_tokens=256,output_tokens=128
  - Deserializer 'json_file': (DataNotSupportedError) Unsupported data for JSONFileDatasetDeserializer, expected str or Path to a local .json or .jsonl file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'parquet_file': (DataNotSupportedError) Unsupported data for ParquetFileDatasetDeserializer, expected str or Path to a local .parquet file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'tar_file': (DataNotSupportedError) Unsupported data for TarFileDatasetDeserializer, expected str or Path to a local .tar file, got prompt_tokens=256,output_tokens=128
  - Deserializer 'text_file': (DataNotSupportedError) Unsupported data for TextFileDatasetDeserializer, expected str or Path to a local .txt or .text file, got prompt_tokens=256,output_tokens=128

Test Plan

This only affects the output of a validation error in benchmark run startup, removing the traceback. While it's not impossible to imagine a CliRunner test for this case, the output analysis would be a bit tedious and it didn't seem worthwhile. (Feel free to tell me otherwise! Currently, test_main.py is rather light on content.)

Related Issues

This is "inspired by" #205 but possibly doesn't actually resolve it.

Resolves #

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Deserialization of some parameters like `--data` takes place during `run` rather than during option processing. It raises a `ValidationError`, but to the Click CLI infrastructure this is an unexpected exception and causes a full traceback which is not generally useful to the user. This change intercepts `ValidationError` (and for completeness the standard Python `ValueError`) and raises a Click `BadParameter` exception encapsulating the validation error text. This will be reported without traceback. Signed-off-by: David Butenhof <dbutenho@redhat.com>

sjmonson · 2026-01-22T21:36:33Z

src/guidellm/__main__.py

+            )
        )
-    )
+    except (ValidationError, ValueError) as err:


My biggest concern here is we lose debugging information for a large class of errors. ValueError is especially generic and could be emitted in any number of places. Maybe instead we define a custom error type for these runtime argument errors and catch that specific exception here?

Hmm... yeah, it actually looks like ValueError is used a lot more than I would have expected. And if any of them can occur after "initial startup" we wouldn't want to hide the traceback. So much for a "simple touch", but it was in any case an interesting excursion.

Yeah, we could use another exception for "static startup validation" errors. Perhaps a better solution would be to refactor the deserializers with a validation method that can be called during option parsing rather than let this sort of thing wait until run. And that'll require a lot more thought ...

dbutenhof requested review from jaredoconnell, markurtz and sjmonson January 22, 2026 20:14

dbutenhof self-assigned this Jan 22, 2026

dbutenhof added the bug label Jan 22, 2026

Merge branch 'main' into fix/badparam

21240f0

dbutenhof assigned dbutenhof and jaredoconnell and unassigned dbutenhof and jaredoconnell Jan 22, 2026

sjmonson requested changes Jan 22, 2026

View reviewed changes

dbutenhof assigned jaredoconnell and unassigned jaredoconnell Jan 22, 2026

dbutenhof marked this pull request as draft January 22, 2026 22:11

dbutenhof removed request for jaredoconnell and markurtz January 26, 2026 18:13

dbutenhof added this to the v0.7.0 milestone Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reporting of deserializer validation errors #550

Improve reporting of deserializer validation errors #550

dbutenhof commented Jan 22, 2026

Uh oh!

sjmonson Jan 22, 2026

Uh oh!

dbutenhof Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve reporting of deserializer validation errors #550

Are you sure you want to change the base?

Improve reporting of deserializer validation errors #550

Conversation

dbutenhof commented Jan 22, 2026

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

sjmonson Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

dbutenhof Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants