Skip to content

Allow user to keep record fields that failed to parse as null. #362

@danecor

Description

@danecor

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

Currently, failed parses from e.g. LLMCodeColumnConfig give an error like

[WARNING] ⚠️ Generation for record at index N failed. Will omit this record from the dataset.
  |----------
  | Cause: The provided output schema was unable to be parsed from model 'model' responses while running generation for column 'column'.
  | Solution: This is most likely temporary as we make additional attempts. If you continue to see more of this, simplify or modify the output schema for structured output and try again. If you are attempting token-intensive tasks like generations with high-reasoning effort, ensure that max_tokens in the model config is high enough to reach completion.
  |----------

The trace (if set up to be exported) is also lost.

Describe the solution you'd like

Ideally, users should have the option to keep these records in the dataset with the json parsed to null. Three reasons:

  1. Some use-cases (with the ORDERED strategy) expect one output for every seed - this violates that.
  2. The drop strategy means losing other work done within the record, which might be useful to the user.
  3. The trace is very useful in debugging parse failures.

Describe alternatives you've considered

Writing my own parser which also permits arbitrary text and parses to null - quite complicated though.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions