Skip to content

Conversation

@marliessophie
Copy link
Member

@marliessophie marliessophie commented Feb 9, 2026

Important

Adds support for running experiments on versioned datasets by introducing a version parameter to dataset fetching and experiment running functions, with a new E2E test to verify functionality.

  • Behavior:
    • DatasetManager.get() now accepts an optional version timestamp, forwards it to datasetItems.list, and exposes it on the returned FetchedDataset object.
    • runExperiment() from a FetchedDataset forwards version as datasetVersion to experiment.run().
    • ExperimentParams gains an optional datasetVersion?: string field.
    • E2E test added in datasets.e2e.test.ts for fetching a dataset at a given version and running an experiment over that snapshot.
  • Main Issue:
    • E2E test’s version timestamp computation can be flaky due to reliance on local time arithmetic (createdAt + 1000ms).

This description was created by Ellipsis for 1ddf69c. You can customize this summary. It will automatically update as commits are pushed.


Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

This PR adds support for running experiments against a versioned snapshot of a Langfuse dataset.

  • DatasetManager.get() now accepts an optional version timestamp, forwards it to datasetItems.list, and exposes it on the returned FetchedDataset object. When calling runExperiment() from that dataset, the same value is forwarded as datasetVersion to experiment.run().
  • ExperimentParams gains an optional datasetVersion?: string field.
  • E2E coverage is added for fetching a dataset at a given version and running an experiment over that snapshot.

Main issue to address before merge: the new E2E test’s version timestamp computation can be flaky because it relies on local time arithmetic (createdAt + 1000ms) rather than a server-observed ordering guarantee between the initial item write and the upsert.

Confidence Score: 4/5

  • This PR is reasonably safe to merge once the new E2E test flakiness around version timestamps is addressed.
  • Core change is a straightforward plumbing of an optional version timestamp through dataset item listing and into experiment execution via a new optional datasetVersion param. The main risk is CI instability from the added E2E test relying on local time arithmetic and ingestion timing, which can cause intermittent failures even when the feature works correctly.
  • tests/e2e/datasets.e2e.test.ts

Important Files Changed

Filename Overview
packages/client/src/dataset/index.ts Threads an optional dataset version through item listing, exposes it on FetchedDataset, and forwards it as datasetVersion when calling experiment.run.
packages/client/src/experiment/types.ts Extends ExperimentParams with optional datasetVersion string to support running experiments against a snapshot of a versioned dataset.
tests/e2e/datasets.e2e.test.ts Adds an end-to-end test that fetches a dataset at a version timestamp, verifies item snapshot semantics, and runs an experiment on that versioned dataset.

Sequence Diagram

sequenceDiagram
  participant T as Test (datasets.e2e)
  participant C as LangfuseClient
  participant D as DatasetManager
  participant API as Langfuse API
  participant E as ExperimentRunner

  T->>C: api.datasets.create(name)
  T->>C: api.datasetItems.create(item1)
  T->>C: waitForServerIngestion()
  T->>D: dataset.get(name)
  D->>API: datasets.get(name)
  loop paginate items
    D->>API: datasetItems.list(datasetName, page, limit)
    Note over D,API: includes {version} if provided
  end
  T->>C: api.datasetItems.create(upsert item1)
  T->>C: api.datasetItems.create(item2)
  T->>D: dataset.get(name, {version})
  D->>API: datasetItems.list(..., version)
  D->>E: experiment.run({data: items, datasetVersion: version, ...})
  E->>API: create dataset run + item runs
  E-->>T: ExperimentResult
Loading

@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Feb 11, 2026 1:31pm

Request Review

@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Feb 9, 2026 4:10pm

Request Review

4 similar comments
@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Feb 9, 2026 4:10pm

Request Review

@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Feb 9, 2026 4:10pm

Request Review

@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Feb 9, 2026 4:10pm

Request Review

@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-js Ready Ready Preview Feb 9, 2026 4:10pm

Request Review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 9, 2026

Additional Comments (1)

tests/e2e/datasets.e2e.test.ts, line 600
Version timestamp is flaky

This test derives versionTimestamp as createdAt + 1000ms and assumes the subsequent upsert happens strictly after that timestamp. If the server timestamps the upsert at or before versionTimestamp (clock skew, coarse timestamp precision, or ingestion ordering), the versioned query will legitimately include the updated state and the assertion on original input/expectedOutput will fail. Use a version timestamp that is guaranteed to be between the two writes as observed by the server (e.g., fetch createdAt from the updated item and set versionTimestamp to a value between the two server-assigned times, or avoid time arithmetic and instead pick versionTimestamp = item1CreatedAt.toISOString() and ensure the update happens after by waiting until server time advances past it).

@marliessophie marliessophie merged commit 0af0fd6 into main Feb 11, 2026
10 checks passed
@marliessophie marliessophie deleted the marlies/lfe-support-dataset-versioning branch February 11, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant