-
Notifications
You must be signed in to change notification settings - Fork 0
testing framework for transformRequest+Response #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
testing framework for transformRequest+Response #72
Conversation
| { | ||
| "error": "400 {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"`max_tokens` must be greater than `thinking.budget_tokens`. Please consult our documentation at https://docs.claude.com/en/docs/build-with-claude/extended-thinking#max-tokens-and-context-window-size\"},\"request_id\":\"req_011CXasVyLu26rs4f6bS7DRJ\"}", | ||
| "name": "Error" | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only error i found so far, this seems to be valid since we define in our case max_tokens = 100 with high reasoning.
idk if we want to throw our own error or something. https://github.com/braintrustdata/lingua/blob/main/payloads/cases/simple.ts#L146
0cc799d to
6093a42
Compare
9e4d458 to
e383b62
Compare
6093a42 to
5c74eae
Compare
| map.insert( | ||
| "input_tokens_details".into(), | ||
| serde_json::json!({ "cached_tokens": self.prompt_cached_tokens.unwrap_or(0) }), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caught this from the validate_response_json js binding. responses require input_token_details and output_token_details
| } | ||
| /* eslint-enable @typescript-eslint/consistent-type-assertions */ | ||
|
|
||
| const isParamCase = (name: string) => name.endsWith("Param"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporary, i didn't want to explode the diff so doing this here.
| @@ -0,0 +1,35 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anthropic_to_chatcompletions means anthropic payload using a chat completions model - we save the actual chat completion response payload so we don't have to incur the LLM cost.
5c74eae to
0c166a9
Compare
e383b62 to
8d60e4b
Compare
0c166a9 to
d09967b
Compare
2342f75 to
5133b4d
Compare
d09967b to
13cb50c
Compare
| } catch (e) { | ||
| // Check if this is an expected error (known provider incompatibility) | ||
| const errorReason = transformErrors[pairKey]?.[caseName]; | ||
| if (errorReason) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check that the errorReason matches the actual reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason listed in transform_errors.json is a semantic reason for the reader, not the actual error.
| // Explicitly skipped tests (add here only if intentionally not supported) | ||
| // Format: "source_to_target_caseName" | ||
| const SKIPPED_TESTS = new Set<string>([ | ||
| // Add entries here with comments explaining why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need that if the list is currently empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a remnant deleting.
5133b4d to
d900a27
Compare
13cb50c to
4325ef5
Compare
b710035 to
21f7e1f
Compare
21f7e1f to
8758352
Compare

This PR adds testing framework for transformRequest and transformResponse. This is specifically an offline testing framework where we test lingua transformations against saved snapshots.
We use the existing cases and WASM bindings generated from previous PR
transform_request,transform_responseandvalidate_*_jsonto verify valid transforms and no regressions.High level, my mental model is:
coverage-reportensures internal consistency between the Universal model & all the providerstransformsensures external compatibility with OpenAPI schema validations during every transform and using the actual SDK post-transform.Diagram:
Phase 1: Capture (one-time, requires API keys) getCaseForProvider(caseName, source) │ ▼ ┌─────────────────────────────────────┐ │ transformAndValidateRequest() │ │ • transform_request() ────────────┼──► validate_*_request() └──────────────────┬──────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ callProvider(target, request) │ │ • openai.chat.completions.create()│ │ • anthropic.messages.create() │ └──────────────────┬──────────────────┘ │ ▼ validate_*_response() │ ▼ writeFileSync(path, response) transforms/{src}_to_{tgt}/{case}.json Phase 2: Test (CI, no API calls) getCaseForProvider(caseName, source) │ ▼ ┌─────────────────────────────────────┐ │ transformAndValidateRequest() │ │ • transform_request() ────────────┼──► validate_*_request() └──────────────────┬──────────────────┘ │ ▼ toMatchSnapshot("request") │ ▼ ┌─────────────────────────────────────┐ │ loadAndValidateResponse() │ │ • readFileSync(path) ─────────────┼──► validate_*_response() └──────────────────┬──────────────────┘ │ ▼ ┌─────────────────────────────────────┐ │ transformResponseData() │ │ • transform_response() ───────────┼──► validate_*_response() └──────────────────┬──────────────────┘ │ ▼ toMatchSnapshot("response")