Conversation
…ation functionality
… error handling and NDJSON support
…e type safety in client operations
…mproved error handling
…D generation; remove unused embedding config logic
…ed documentation and remove OpenAI references
…update related API and SDK logic
…embedding recomputation logic
…lement chunk table creation and embedding handling
…oint and related request/response models
docs: update PROJECT.md to reflect embedding provider change to Gemini only docs: revise README.md for clarity on Vexi's functionality and usage chore: update STATUS.md with next steps for documentation cleanup chore: update TODO.md to include .env.example file addition feat: add api/.env.example for environment variable configuration fix: update example-app to use new client import and improve logging chore: update example-app package-lock.json for dependency versions chore: update example-app package.json to include reindex script fix: refine schema.ts in example-app for embedding configuration
There was a problem hiding this comment.
Pull request overview
This PR makes Vexi “v1/production-ready” by formalizing the schema contract and workflow end-to-end: a runtime Table wrapper + stable-ish schema serialization in the SDK/CLI, a new Rust/Axum API with a schema registry and additive migrations, plus documentation and CI to codify the intended developer experience.
Changes:
- SDK: introduce
createTable/Tablewrapper, richer inferred types (Row,InsertInput, etc.), and expand the client to supportinsert(returning rows),update, andsearch. - CLI: switch to v1-only one-shot
POST /sync, improve table detection and output modes, addreindex. - API: replace the legacy TS/Fastify server with a Rust/Axum server implementing
/sync, insert/update/search/reindex, schema registry persistence, and Gemini embedding integration; add env/CI/docs updates.
Reviewed changes
Copilot reviewed 35 out of 39 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/src/schema.ts | Adds Table wrapper + v1 table types (Row, InsertInput, etc.) and createTable. |
| sdk/src/index.ts | Re-exports new schema APIs/types from the SDK entrypoint. |
| sdk/src/client.ts | Updates typed client to support insert/update/search + better error handling. |
| sdk/src/cli.ts | Reworks CLI sync to v1 one-shot /sync, adds reindex command and improved output. |
| sdk/AGENTS.md | Updates SDK conventions (notably “type-only, no interface”) and createTable docs. |
| example-app/schema.ts | Updates example schema to consumer-style imports and createTable. |
| example-app/main.ts | Updates example usage to new insert/update/search behavior and output. |
| example-app/package.json | Uses local vexi dependency and adds reindex script; pins tool versions. |
| example-app/package-lock.json | Locks updated deps including local vexi link. |
| api/tsconfig.json | Removes legacy TS API config (API is now Rust). |
| api/tsconfig.build.json | Removes legacy TS build config. |
| api/src/validator.ts | Removes Zod validator from legacy TS API. |
| api/src/schema.ts | Removes Arrow schema conversion from legacy TS API. |
| api/src/index.ts | Removes legacy Fastify server entrypoint. |
| api/scripts/inspect.ts | Removes legacy TS inspection script. |
| api/prettier.config.mts | Removes legacy formatter config for TS API. |
| api/eslint.config.mts | Removes legacy ESLint config for TS API. |
| api/package.json | Removes Node package manifest for legacy TS API. |
| api/AGENTS.md | Removes legacy TS API agent instructions. |
| api/Cargo.toml | Adds Rust crate manifest + dependencies for Axum/LanceDB/Arrow/etc. |
| api/src/main.rs | Adds Axum server bootstrap, routing, env config. |
| api/src/models.rs | Adds Rust request/response + schema/registry model types. |
| api/src/db.rs | Adds schema registry persistence helpers (LanceDB table). |
| api/src/sync.rs | Adds /sync logic: validation, additive migrations, registry writes, warnings. |
| api/src/handlers.rs | Adds HTTP handlers for health/sync/insert/update/search/reindex (+ chunk support). |
| api/src/embeddings.rs | Adds Gemini batch embedding client via reqwest. |
| api/src/chunking.rs | Adds recursive-markdown chunking + chunk table schema helpers. |
| api/.gitignore | Updates ignores for Rust build + env files. |
| api/.env.example | Adds example env for GEMINI key, vector dim, debug flags, LanceDB URI. |
| TODO.md | Updates TODOs to reflect v1 /sync and env example workflow. |
| STATUS.md | Adds v1 “current behavior” contract/decisions document. |
| README.md | Rewrites README to reflect new architecture/workflow + endpoints/CLI. |
| PROJECT.md | Adds detailed product/architecture philosophy doc for v1. |
| PLAN.md | Adds comprehensive v1 build plan aligned with new contract. |
| AGENTS.md | Updates repo-wide instructions to match Rust API and createTable. |
| .gitignore | Expands ignores (env files, node outputs, .lancedb, etc.). |
| .github/workflows/ci.yml | Adds CI for Node (sdk + example-app) and Rust (fmt/clippy/test). |
Files not reviewed (1)
- example-app/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Process each table independently; collect all errors for better DX. | ||
| // Important: we only write to the schema registry if all tables succeed. | ||
| for (table_name, table_spec) in request.tables { | ||
| let result = sync_one_table(state, &table_name, &table_spec).await; |
There was a problem hiding this comment.
sync_schema defers writing the schema registry until all tables succeed, but sync_one_table already creates/migrates tables inside the per-table loop. If a later table fails validation, earlier tables may have been created/migrated without a matching registry update, leaving the database and registry inconsistent. Consider doing a validation/planning pass first (no mutations), then applying all creates/migrations + registry writes only after there are no errors, or otherwise ensuring DB mutations are rolled back consistently.
| // Prepare registry update but don't write yet (write only if all tables succeed). | ||
| let next_version = prev_version + 1; | ||
| let registry_update = ( | ||
| table_name.to_string(), | ||
| table_spec.clone(), | ||
| resolved_embedding.clone(), | ||
| next_version, |
There was a problem hiding this comment.
The registry schema_version is incremented (prev_version + 1) on every sync, even when the schema is unchanged. This will grow the registry table quickly and makes the version number less meaningful. Consider only incrementing/persisting a new registry entry when the schema or resolved embedding config actually changes.
| let decoder = arrow_json::ReaderBuilder::new(arrow_schema.clone()).build(json_lines.as_bytes()); | ||
|
|
||
| // Collect batches | ||
| let batches_result: Result<Vec<_>, _> = decoder.unwrap().collect(); | ||
|
|
||
| if let Err(e) = batches_result { | ||
| return api_error( | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| "json_to_arrow_failed", | ||
| format!("JSON to Arrow conversion failed: {}", e), | ||
| ); | ||
| } | ||
| let batches = batches_result.unwrap(); | ||
|
|
There was a problem hiding this comment.
decoder is a Result from arrow_json::ReaderBuilder::build(...), but it is immediately unwrap()'d. If Arrow JSON decoding fails (e.g. schema mismatch, invalid NDJSON), the server will panic instead of returning a structured error. Handle the Err case from build(...) and return an API error, and avoid the later batches_result.unwrap() pattern as well (use a single match/?).
| let decoder = arrow_json::ReaderBuilder::new(arrow_schema.clone()).build(json_lines.as_bytes()); | |
| // Collect batches | |
| let batches_result: Result<Vec<_>, _> = decoder.unwrap().collect(); | |
| if let Err(e) = batches_result { | |
| return api_error( | |
| StatusCode::INTERNAL_SERVER_ERROR, | |
| "json_to_arrow_failed", | |
| format!("JSON to Arrow conversion failed: {}", e), | |
| ); | |
| } | |
| let batches = batches_result.unwrap(); | |
| // Build an Arrow JSON decoder and collect all RecordBatches, returning a structured | |
| // API error if decoding or collection fails instead of panicking. | |
| let batches = match arrow_json::ReaderBuilder::new(arrow_schema.clone()) | |
| .build(json_lines.as_bytes()) | |
| .and_then(|decoder| decoder.collect()) | |
| { | |
| Ok(batches) => batches, | |
| Err(e) => { | |
| return api_error( | |
| StatusCode::INTERNAL_SERVER_ERROR, | |
| "json_to_arrow_failed", | |
| format!("JSON to Arrow conversion failed: {}", e), | |
| ); | |
| } | |
| }; |
| get: (_target, tableNameProp) => { | ||
| const tableName = String(tableNameProp); | ||
| return { | ||
| insert: async (data: Infer<DB[keyof DB]> | Infer<DB[keyof DB]>[]) => { | ||
| const records = Array.isArray(data) ? data : [data]; | ||
|
|
||
| // Help developers catch typos early. | ||
| if (!tableNames.has(tableName)) { | ||
| throw new Error( | ||
| `Unknown table "${tableName}". Did you forget to include it in createClient({ schema: ... })?`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
createClient's Proxy get trap stringifies every property key and throws if it's not a known table. In Node, proxies are commonly accessed with symbol keys (e.g. util.inspect.custom, Symbol.toStringTag), which will be converted to strings like "Symbol(...)" and cause unexpected throws during logging/inspection. Handle typeof tableNameProp === "symbol" (and other non-table keys like "then") by returning undefined or a safe fallback instead of throwing.
| * Stable JSON representation used by `vexi sync`. | ||
| */ | ||
| toJSON(): TableJson { | ||
| return { | ||
| version: 1, | ||
| columns: this.columns, | ||
| }; |
There was a problem hiding this comment.
Table#toJSON() currently returns { version, columns: this.columns } where columns values are Field instances. That means calling table.toJSON() directly does not yield a plain JSON-serializable object, despite the docstring/type claiming a stable JSON shape. Consider converting each Field to its JSON form inside toJSON() (and updating TableJson.columns to the serialized shape) so the returned value is actually JSON data, not class instances.
| function isVexiTable(value: unknown): value is { toJSON(): unknown } { | ||
| if (typeof value !== "object" || value === null) { | ||
| return false; | ||
| } | ||
| return ( | ||
| "isVexiTable" in value && | ||
| (value as { isVexiTable: unknown }).isVexiTable === true | ||
| ); | ||
| } |
There was a problem hiding this comment.
isVexiTable's type predicate claims the value has a toJSON() method, but the runtime check only verifies isVexiTable === true. This can misclassify arbitrary objects as tables and then produce invalid sync payloads. Also validate that toJSON exists and is a function (or explicitly check value instanceof Table if you want a stricter guard).
| // If it already exists, this will return an error; we ignore it. | ||
| let _ = db | ||
| .create_empty_table(REGISTRY_TABLE_NAME, schema) | ||
| .execute() | ||
| .await; | ||
| Ok(()) |
There was a problem hiding this comment.
ensure_schema_registry currently swallows all errors from create_empty_table and always returns Ok(()). This can hide real initialization failures (e.g. I/O errors) and make subsequent registry operations fail later with less actionable errors. Consider returning the error unless it specifically indicates the table already exists.
| // If it already exists, this will return an error; we ignore it. | |
| let _ = db | |
| .create_empty_table(REGISTRY_TABLE_NAME, schema) | |
| .execute() | |
| .await; | |
| Ok(()) | |
| // If it already exists, this will return an error; we ignore that specific case, | |
| // but propagate other errors so initialization failures are not silently swallowed. | |
| match db | |
| .create_empty_table(REGISTRY_TABLE_NAME, schema) | |
| .execute() | |
| .await | |
| { | |
| Ok(_) => Ok(()), | |
| Err(e) => { | |
| let msg = e.to_string(); | |
| if msg.contains("already exists") { | |
| Ok(()) | |
| } else { | |
| Err(msg) | |
| } | |
| } | |
| } |
This pull request introduces foundational improvements to the Vexi project, focusing on documentation, CI/CD automation, and clarifying the development workflow. The most important changes are the addition of a comprehensive project overview, a status/decisions document, and a CI workflow for both Node and Rust components. Several minor updates clarify API usage, project structure, and environment configuration.
Project documentation and philosophy:
PROJECT.mdwith a detailed overview of Vexi's goals, schema API, developer workflow, backend responsibilities, and SDK/client usage. This document explains the end-to-end type safety, automatic embeddings, migration philosophy, and the intended developer experience.STATUS.mdto track v1 status, non-negotiable decisions, completed features, and next steps. This document clarifies the current API contract, migration policy, embedding provider, and search/chunking strategies.CI/CD and environment configuration:
.github/workflows/ci.ymlfor automated CI, running lint/build/typecheck for Node SDK/example-app and formatting/lint/tests for Rust API.api/.env.exampleand updated.gitignoreto ignore.envand Rust build artifacts, clarifying environment variables required for embeddings, LanceDB, and debug endpoints. [1] [2]API and SDK usage clarification:
README.mdto reflect the new project philosophy, quickstart instructions, schema/client usage, CLI commands, API endpoints, and limitations. The README now includes a mermaid diagram and emphasizes the schema registry contract.AGENTS.md,TODO.md, and removed obsolete instructions inapi/AGENTS.mdto clarify the API server (now Rust/Axum), validation approach, and migration endpoint. [1] [2] [3] [4]Project structure and conventions:
createTable), and clarified the removal of legacy endpoints and migration strategies. [1] [2] [3]Summary:
These changes establish a clear foundation for Vexi's development and usage, automate CI for reliability, and remove legacy ambiguity in documentation and project structure.