Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
8c9fe90
Changed default storage to in-memory. SQLite still available via config.
mattpocock Oct 19, 2025
15cface
Remove problematic backend-only-constants imports
github-actions[bot] Oct 19, 2025
ab2af0d
Fixed CI properly
mattpocock Oct 19, 2025
ed18fcb
Huge move from evals -> suites, and results -> evals
mattpocock Oct 19, 2025
e8a43c2
Added changeset
mattpocock Oct 19, 2025
9f0a2aa
Removed streaming text support from tasks.
mattpocock Oct 19, 2025
fb39ab9
feat: Support .env files by default via dotenv/config
github-actions[bot] Oct 19, 2025
18cfb03
feat: Support setupFiles from vitest.config.ts with evalite.config.ts…
github-actions[bot] Oct 19, 2025
0edd941
Fixes after cherrypick
mattpocock Oct 19, 2025
c5fc444
Formatting
mattpocock Oct 19, 2025
bfe79ab
Docs updates
mattpocock Oct 19, 2025
35e1547
Docs updates
mattpocock Oct 20, 2025
5d39883
feat: Add scorer utilities for LLM and embedding-based evaluations
cantemizyurek Oct 21, 2025
aff7318
feat: Integrate new faithfulness scorer and update dependencies
cantemizyurek Oct 21, 2025
5a83948
feat: Add AnswerSimilarity scorer for evaluating semantic similarity
cantemizyurek Oct 22, 2025
a3149c4
feat: Add evaluation script for Answer Similarity
cantemizyurek Oct 22, 2025
9f3aa48
feat: Add Context Recall Scorer
cantemizyurek Oct 22, 2025
747be50
feat: Add evaluation script for RAG Context Recall
cantemizyurek Oct 22, 2025
8b1dc45
refactor: Update scorers to use 'expected' instead of 'input.reference'
cantemizyurek Oct 22, 2025
29c3d3e
refactor: Remove failedToScore utility and replace with error in scorers
cantemizyurek Oct 22, 2025
f092f47
refactor: Update scoring schemas to use jsonSchema and remove zod dep…
cantemizyurek Oct 22, 2025
c749f27
refactor: Simplify answerSimilarity scorer by removing threshold logi…
cantemizyurek Oct 22, 2025
514519b
refactor: rename embedding to embeddingModel clearer
cantemizyurek Oct 22, 2025
55f4c22
refactor: update embedding property to embeddingModel for clarity
cantemizyurek Oct 22, 2025
886a309
refactor: Introduce Scorers namespace with types for LLM and embeddin…
cantemizyurek Oct 22, 2025
b06d74c
refactor: Move SingleTurnSample and EvaluationSample types to Scorers…
cantemizyurek Oct 22, 2025
19bc8d5
refactor: Update Evalite types to support userInput structure. And ad…
cantemizyurek Oct 22, 2025
8868807
refactor: Export utility functions for sample type checks in scorers
cantemizyurek Oct 22, 2025
3b84503
refactor: Rename retrievedContexts to groundTruth in scoring interfac…
cantemizyurek Oct 22, 2025
9843ad5
refactor: Consolidate scorer creation functions and enhance structure
cantemizyurek Oct 23, 2025
ff2eee2
refactor: Enhance scorer options structure with SingleTurnFn and Mult…
cantemizyurek Oct 23, 2025
ba3c1bb
fix: Fix evaluation input types to fit new format
cantemizyurek Oct 23, 2025
1554b80
refactor: Simplify function signatures in contextRecall and faithfuln…
cantemizyurek Oct 23, 2025
b6d58a4
Changed default storage to in-memory. SQLite still available via config.
mattpocock Oct 19, 2025
a888889
Remove problematic backend-only-constants imports
github-actions[bot] Oct 19, 2025
02ae5b6
Fixed CI properly
mattpocock Oct 19, 2025
5731e7c
feat: Add sheet overlay backdrop for evaluation routes
cantemizyurek Oct 22, 2025
d93fb41
fix: Update layout for ResultComponent to ensure minimum height is ma…
cantemizyurek Oct 22, 2025
6f14cd3
Create real-phones-join.md
mattpocock Oct 22, 2025
054d61e
feat: Swap from React Markdown to Streamdown
cantemizyurek Oct 22, 2025
84b14e4
refactor: change codeblocks theme to dark+ and light+
cantemizyurek Oct 24, 2025
eb1798b
fix: round millisecond durations to avoid floating point precision di…
github-actions[bot] Oct 24, 2025
33d5389
refactor: Simplify scorer factory API (#262)
mattpocock Oct 25, 2025
085daf6
Fix mismatch between input and output types. (#263)
cantemizyurek Oct 25, 2025
5deb925
Enhance dark theme (#274)
cantemizyurek Oct 27, 2025
9c83ad9
Add Search functionality (#277)
cantemizyurek Oct 28, 2025
83a6071
Add Tool Call Accuracy Scorer (#269)
cantemizyurek Oct 28, 2025
1889dad
Add watch mode test infrastructure
mattpocock Oct 28, 2025
9061c21
Add Prompt Builder utility (#278)
cantemizyurek Oct 30, 2025
a7f84f9
feat: Enhance scorer factory options with generic configuration types…
cantemizyurek Nov 1, 2025
a1fd381
Updated docs
mattpocock Nov 1, 2025
fad4927
Add Answer Correctness Scorer (#287)
cantemizyurek Nov 2, 2025
b826bec
Fixed type error
mattpocock Nov 2, 2025
74425f2
Reduced timeouts to speed up tests
mattpocock Nov 2, 2025
2d26c4c
Fixed a bug with the tests where we were overriding the global proces…
mattpocock Nov 2, 2025
32d42f5
Add final scorers for v1 (#290)
cantemizyurek Nov 5, 2025
a070af7
Bugfix
mattpocock Nov 5, 2025
288523b
Made better-sqlite3 an optional peer dependency
mattpocock Nov 5, 2025
980c60c
Fixed test
mattpocock Nov 5, 2025
ae83de1
Add 2 new scorers Exact Match and Contains (#275)
cantemizyurek Nov 5, 2025
9262396
matt/v1 tweaks (#302)
mattpocock Nov 6, 2025
787cdb8
Updated preview
mattpocock Nov 6, 2025
b3a8a45
Updated to attempt to fix bug
mattpocock Nov 6, 2025
00b454d
Debug
mattpocock Nov 6, 2025
f315b25
Fixed v1 branch
mattpocock Nov 6, 2025
2e6bb04
Removed sqlite-storage imports
mattpocock Nov 6, 2025
fe4d50b
Reverted scorers changes in types to go back to TExpected
mattpocock Nov 8, 2025
732f66d
Refactored scorers to be simple functions
mattpocock Nov 8, 2025
daffd5a
Removed useless metadata
mattpocock Nov 8, 2025
bda38a6
Fixed label in sidebar
mattpocock Nov 8, 2025
8f9495c
Made it so passing UI messages (from AI SDK) directly into Evalite sp…
mattpocock Nov 8, 2025
bdc5115
Added additional rendering logic so that arrays of object show up as …
mattpocock Nov 8, 2025
a5a0c56
Added `only` option to variants in `evalite.each()` to selectively ru…
mattpocock Nov 8, 2025
ff8c08c
Made it so AI SDK tool calls display nicely in the UI
mattpocock Nov 8, 2025
c76a1c9
Improved the way tool calls look
mattpocock Nov 8, 2025
a9a8883
Made scorer `name` field optional. When using pre-built scorers, name…
mattpocock Nov 8, 2025
78de9b1
Simplified the tool call display logic so users can mock it themselves
mattpocock Nov 9, 2025
ba3d876
Removed implicit reading of vitest.config.ts/vite.config.ts files. Us…
mattpocock Nov 9, 2025
270ba38
Add failing test reproducing issue #95
mattpocock Oct 20, 2025
fabc7a6
Added test for #91
mattpocock Nov 9, 2025
9263870
Added rerun button to UI in watch and serve modes
mattpocock Nov 9, 2025
031cf82
Fixed CI
mattpocock Nov 9, 2025
08e62c2
Removed rerun test
mattpocock Nov 9, 2025
ed3c662
v1.0.0-beta.0
mattpocock Nov 9, 2025
fcdf6f1
Add beta banner and Vercel config for beta docs
mattpocock Nov 9, 2025
44651b6
Trigger Vercel deployment
mattpocock Nov 9, 2025
b64d1bb
matt/docs improvements (#311)
mattpocock Nov 9, 2025
059edfd
Updated docs
mattpocock Nov 9, 2025
729304c
feat: init fumadocs
cantemizyurek Nov 9, 2025
eb7f2c7
feat: add Logo component and update dependencies
cantemizyurek Nov 9, 2025
c7335c4
feat: port docs from old docs
cantemizyurek Nov 9, 2025
68495a5
feat: add page actions and update dependencies
cantemizyurek Nov 9, 2025
3cb9bbc
feat: add custom rewrites for MDX documentation paths
cantemizyurek Nov 9, 2025
59b8378
feat: add favicon.ico for improved branding in documentation
cantemizyurek Nov 9, 2025
b3cf938
fix: update quickstart guide description to reflect Evalite setup ins…
cantemizyurek Nov 9, 2025
11e9a39
fix: update quickstart guide to use npm commands for installation and…
cantemizyurek Nov 9, 2025
29cfa94
feat: update documentation layout
cantemizyurek Nov 9, 2025
8425817
refactor: enhance layout and responsiveness of home page components
cantemizyurek Nov 9, 2025
8daa90f
feat: add navigation link to documentation in home layout
cantemizyurek Nov 9, 2025
ef1403a
chore: remove unused icon from documentation link in home layout
cantemizyurek Nov 9, 2025
58e16a0
feat: add metadata for Open Graph and Twitter cards in layout
cantemizyurek Nov 9, 2025
2334ae0
chore: remove old evalite-docs files
cantemizyurek Nov 9, 2025
dc1244d
feat: integrate Geist font and update package dependencies in layout
cantemizyurek Nov 9, 2025
94240c1
chore: update layout metadata and remove obsolete Astro type definitions
cantemizyurek Nov 9, 2025
f6b8535
fix: adjust padding in testimonials section for improved layout consi…
cantemizyurek Nov 10, 2025
26e7f52
fix: correct spelling of 'evalite' in features section description
cantemizyurek Nov 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/0000-export-command-change.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Export command now uses the storage specified in the config and auto-runs if empty.
5 changes: 5 additions & 0 deletions .changeset/0000-in-memory-default.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Changed default storage to in-memory. SQLite still available via config.
5 changes: 5 additions & 0 deletions .changeset/0000-optional-scorer-name.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": patch
---

Made scorer `name` field optional. When using pre-built scorers, name and description are now automatically extracted from the scorer's return value.
34 changes: 34 additions & 0 deletions .changeset/0000-remove-implicit-vitest-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
"evalite": major
---

Removed implicit reading of vitest.config.ts/vite.config.ts files. Users must now explicitly pass Vite config via evalite.config.ts using the new `viteConfig` option. This change makes configuration more explicit and less confusing.

**Migration Guide:**

Before:

```ts
// vitest.config.ts was automatically read
export default defineConfig({
test: {
testTimeout: 60000,
},
});
```

After:

```ts
// evalite.config.ts
import { defineConfig } from "evalite/config";
import viteConfig from "./vite.config.ts";

export default defineConfig({
viteConfig: viteConfig,
// Note: testTimeout, maxConcurrency, and setupFiles
// must be at root level, not in viteConfig.test
testTimeout: 60000,
setupFiles: ["./setup.ts"],
});
```
5 changes: 5 additions & 0 deletions .changeset/0000-remove-streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": minor
---

Removed streaming text support from tasks. Process streams before returning from task() (e.g., await result.text for AI SDK).
5 changes: 5 additions & 0 deletions .changeset/0000-rerun-button.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": patch
---

Added rerun button to UI in watch and serve modes
5 changes: 5 additions & 0 deletions .changeset/0000-variant-only.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": patch
---

Added `only` option to variants in `evalite.each()` to selectively run specific variants.
5 changes: 5 additions & 0 deletions .changeset/0234-auto-dotenv-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": minor
---

Support .env files by default via dotenv/config. Environment variables from .env files are now automatically loaded without any configuration needed. Users no longer need to manually add `setupFiles: ["dotenv/config"]` to their evalite.config.ts.
5 changes: 5 additions & 0 deletions .changeset/angry-dogs-sort.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": patch
---

Made it so passing UI messages (from AI SDK) directly into Evalite spawns a custom UI.
5 changes: 5 additions & 0 deletions .changeset/long-olives-give.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Moved storage API from evals -> suites, results -> evals. This will likely cause issues for existing SQLite databases when released, so will need migration.
26 changes: 26 additions & 0 deletions .changeset/pre.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"mode": "pre",
"tag": "beta",
"initialVersions": {
"evalite": "0.19.0",
"evalite-docs": "0.0.1",
"evalite-tests": "0.0.11",
"example": "0.0.11"
},
"changesets": [
"0000-export-command-change",
"0000-in-memory-default",
"0000-optional-scorer-name",
"0000-remove-implicit-vitest-config",
"0000-remove-streaming",
"0000-rerun-button",
"0000-variant-only",
"0234-auto-dotenv-support",
"angry-dogs-sort",
"long-olives-give",
"real-phones-join",
"table-rendering",
"thick-birds-design",
"wet-clocks-camp"
]
}
5 changes: 5 additions & 0 deletions .changeset/real-phones-join.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite-ui": patch
---

Added an overlay to the backdrop when viewing a trace
5 changes: 5 additions & 0 deletions .changeset/sixty-jeans-melt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Dropped compatibility with autoevals, and implemented our own built-in library of scorers.
5 changes: 5 additions & 0 deletions .changeset/table-rendering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": patch
---

UI now renders simple arrays of objects and flat objects as markdown tables instead of JSON trees for better readability
5 changes: 5 additions & 0 deletions .changeset/thick-birds-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": patch
---

Made better-sqlite3 an optional peer dependency
5 changes: 5 additions & 0 deletions .changeset/wet-clocks-camp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite-ui": minor
---

Add the ability to search and filter evals in the UI
5 changes: 4 additions & 1 deletion .github/workflows/preview.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,14 @@ jobs:
preview:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v5
with:
fetch-depth: 0

- uses: pnpm/action-setup@v4

- run: git status

- uses: actions/setup-node@v4
with:
node-version: 22.x
Expand Down
26 changes: 26 additions & 0 deletions apps/docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# deps
/node_modules

# generated content
.source

# test & build
/coverage
/.next/
/out/
/build
*.tsbuildinfo

# misc
.DS_Store
*.pem
/.pnp
.pnp.js
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# others
.env*.local
.vercel
next-env.d.ts
45 changes: 45 additions & 0 deletions apps/docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# docs

This is a Next.js application generated with
[Create Fumadocs](https://github.com/fuma-nama/fumadocs).

Run development server:

```bash
npm run dev
# or
pnpm dev
# or
yarn dev
```

Open http://localhost:3000 with your browser to see the result.

## Explore

In the project, you can see:

- `lib/source.ts`: Code for content source adapter, [`loader()`](https://fumadocs.dev/docs/headless/source-api) provides the interface to access your content.
- `lib/layout.shared.tsx`: Shared options for layouts, optional but preferred to keep.

| Route | Description |
| ------------------------- | ------------------------------------------------------ |
| `app/(home)` | The route group for your landing page and other pages. |
| `app/docs` | The documentation layout and pages. |
| `app/api/search/route.ts` | The Route Handler for search. |

### Fumadocs MDX

A `source.config.ts` config file has been included, you can customise different options like frontmatter schema.

Read the [Introduction](https://fumadocs.dev/docs/mdx) for further details.

## Learn More

To learn more about Next.js and Fumadocs, take a look at the following
resources:

- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js
features and API.
- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.
- [Fumadocs](https://fumadocs.dev) - learn about Fumadocs
84 changes: 84 additions & 0 deletions apps/docs/app/(home)/components/cta-section.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import Link from "next/link";
import { buttonVariants } from "@/components/ui/button";
import { cn } from "@/lib/cn";
import { ArrowRight, BookOpen, Code, Rocket } from "lucide-react";

export function CTASection() {
return (
<section className="flex flex-col border-b border-border w-full">
<div className="grid grid-cols-1 lg:grid-cols-2 border-b border-fd-border">
<div className="flex flex-col justify-center gap-6 p-6 sm:p-8 lg:p-12 border-b lg:border-b-0 lg:border-r border-fd-border bg-fd-accent/30">
<div className="flex flex-col gap-3">
<h2 className="text-3xl sm:text-4xl lg:text-5xl font-semibold leading-tight">
Start building better AI apps
</h2>
<p className="text-fd-muted-foreground text-base sm:text-lg leading-relaxed">
Get started with Evalite in minutes. Write your first eval and see
results instantly.
</p>
</div>
<div className="flex flex-col gap-3">
<Link href="/docs">
<button
className={cn(
buttonVariants({ color: "primary" }),
"rounded-none gap-2 w-full justify-between group"
)}
>
<span className="flex items-center gap-2">
<BookOpen className="size-4" />
Read Documentation
</span>
<ArrowRight className="size-4 group-hover:translate-x-1 transition-transform" />
</button>
</Link>
<Link href="/docs/guides/quickstart">
<button
className={cn(
buttonVariants({ color: "outline" }),
"rounded-none gap-2 w-full justify-between group"
)}
>
<span className="flex items-center gap-2">
<Rocket className="size-4" />
Quick Start Guide
</span>
<ArrowRight className="size-4 group-hover:translate-x-1 transition-transform" />
</button>
</Link>
</div>
</div>
<div className="flex flex-col justify-center gap-6 sm:gap-8 p-6 sm:p-8 lg:p-12">
<div className="flex items-start gap-3 sm:gap-4">
<div className="p-2 sm:p-3 border border-fd-border bg-fd-accent/20 shrink-0">
<Code className="size-5 sm:size-6 text-fd-foreground" />
</div>
<div className="flex flex-col gap-2 flex-1 min-w-0">
<h3 className="text-lg sm:text-xl font-semibold">
TypeScript Native
</h3>
<p className="text-fd-muted-foreground text-sm sm:text-base">
Write evals in TypeScript with full type safety and IntelliSense
support.
</p>
</div>
</div>
<div className="flex items-start gap-3 sm:gap-4">
<div className="p-2 sm:p-3 border border-fd-border bg-fd-accent/20 shrink-0">
<Rocket className="size-5 sm:size-6 text-fd-foreground" />
</div>
<div className="flex flex-col gap-2 flex-1 min-w-0">
<h3 className="text-lg sm:text-xl font-semibold">
Local Development
</h3>
<p className="text-fd-muted-foreground text-sm sm:text-base">
Run everything locally. No API keys, no cloud services, just
your code.
</p>
</div>
</div>
</div>
</div>
</section>
);
}
28 changes: 28 additions & 0 deletions apps/docs/app/(home)/components/decorative-panel.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
interface DecorativePanelProps {
variant: "left" | "right";
}

export function DecorativePanel({ variant }: DecorativePanelProps) {
const isLeft = variant === "left";

return (
<div
className={`hidden lg:flex flex-1 ${isLeft ? "border-r" : "border-l"} border-border relative overflow-hidden`}
>
<div
className="absolute inset-0"
style={{
backgroundImage: `repeating-linear-gradient(
${isLeft ? "135deg" : "45deg"},
transparent,
transparent 16px,
currentColor 16px,
currentColor 17.5px
)`,
color: "hsl(var(--border))",
opacity: 0.3,
}}
/>
</div>
);
}
Loading