Skip to content

Make Vexi production-ready#2

Merged
marcoshernanz merged 18 commits intomainfrom
embedding-queue
Feb 13, 2026
Merged

Make Vexi production-ready#2
marcoshernanz merged 18 commits intomainfrom
embedding-queue

Conversation

@marcoshernanz
Copy link
Owner

This pull request introduces foundational improvements to the Vexi project, focusing on documentation, CI/CD automation, and clarifying the development workflow. The most important changes are the addition of a comprehensive project overview, a status/decisions document, and a CI workflow for both Node and Rust components. Several minor updates clarify API usage, project structure, and environment configuration.

Project documentation and philosophy:

  • Added PROJECT.md with a detailed overview of Vexi's goals, schema API, developer workflow, backend responsibilities, and SDK/client usage. This document explains the end-to-end type safety, automatic embeddings, migration philosophy, and the intended developer experience.
  • Added STATUS.md to track v1 status, non-negotiable decisions, completed features, and next steps. This document clarifies the current API contract, migration policy, embedding provider, and search/chunking strategies.

CI/CD and environment configuration:

  • Added .github/workflows/ci.yml for automated CI, running lint/build/typecheck for Node SDK/example-app and formatting/lint/tests for Rust API.
  • Added api/.env.example and updated .gitignore to ignore .env and Rust build artifacts, clarifying environment variables required for embeddings, LanceDB, and debug endpoints. [1] [2]

API and SDK usage clarification:

  • Updated README.md to reflect the new project philosophy, quickstart instructions, schema/client usage, CLI commands, API endpoints, and limitations. The README now includes a mermaid diagram and emphasizes the schema registry contract.
  • Updated AGENTS.md, TODO.md, and removed obsolete instructions in api/AGENTS.md to clarify the API server (now Rust/Axum), validation approach, and migration endpoint. [1] [2] [3] [4]

Project structure and conventions:

  • Updated documentation to reflect the new backend stack (Rust/Axum, LanceDB), the canonical schema API (createTable), and clarified the removal of legacy endpoints and migration strategies. [1] [2] [3]

Summary:
These changes establish a clear foundation for Vexi's development and usage, automate CI for reliability, and remove legacy ambiguity in documentation and project structure.

…D generation; remove unused embedding config logic
…ed documentation and remove OpenAI references
…lement chunk table creation and embedding handling
docs: update PROJECT.md to reflect embedding provider change to Gemini only

docs: revise README.md for clarity on Vexi's functionality and usage

chore: update STATUS.md with next steps for documentation cleanup

chore: update TODO.md to include .env.example file addition

feat: add api/.env.example for environment variable configuration

fix: update example-app to use new client import and improve logging

chore: update example-app package-lock.json for dependency versions

chore: update example-app package.json to include reindex script

fix: refine schema.ts in example-app for embedding configuration
Copilot AI review requested due to automatic review settings February 13, 2026 20:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes Vexi “v1/production-ready” by formalizing the schema contract and workflow end-to-end: a runtime Table wrapper + stable-ish schema serialization in the SDK/CLI, a new Rust/Axum API with a schema registry and additive migrations, plus documentation and CI to codify the intended developer experience.

Changes:

  • SDK: introduce createTable/Table wrapper, richer inferred types (Row, InsertInput, etc.), and expand the client to support insert (returning rows), update, and search.
  • CLI: switch to v1-only one-shot POST /sync, improve table detection and output modes, add reindex.
  • API: replace the legacy TS/Fastify server with a Rust/Axum server implementing /sync, insert/update/search/reindex, schema registry persistence, and Gemini embedding integration; add env/CI/docs updates.

Reviewed changes

Copilot reviewed 35 out of 39 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
sdk/src/schema.ts Adds Table wrapper + v1 table types (Row, InsertInput, etc.) and createTable.
sdk/src/index.ts Re-exports new schema APIs/types from the SDK entrypoint.
sdk/src/client.ts Updates typed client to support insert/update/search + better error handling.
sdk/src/cli.ts Reworks CLI sync to v1 one-shot /sync, adds reindex command and improved output.
sdk/AGENTS.md Updates SDK conventions (notably “type-only, no interface”) and createTable docs.
example-app/schema.ts Updates example schema to consumer-style imports and createTable.
example-app/main.ts Updates example usage to new insert/update/search behavior and output.
example-app/package.json Uses local vexi dependency and adds reindex script; pins tool versions.
example-app/package-lock.json Locks updated deps including local vexi link.
api/tsconfig.json Removes legacy TS API config (API is now Rust).
api/tsconfig.build.json Removes legacy TS build config.
api/src/validator.ts Removes Zod validator from legacy TS API.
api/src/schema.ts Removes Arrow schema conversion from legacy TS API.
api/src/index.ts Removes legacy Fastify server entrypoint.
api/scripts/inspect.ts Removes legacy TS inspection script.
api/prettier.config.mts Removes legacy formatter config for TS API.
api/eslint.config.mts Removes legacy ESLint config for TS API.
api/package.json Removes Node package manifest for legacy TS API.
api/AGENTS.md Removes legacy TS API agent instructions.
api/Cargo.toml Adds Rust crate manifest + dependencies for Axum/LanceDB/Arrow/etc.
api/src/main.rs Adds Axum server bootstrap, routing, env config.
api/src/models.rs Adds Rust request/response + schema/registry model types.
api/src/db.rs Adds schema registry persistence helpers (LanceDB table).
api/src/sync.rs Adds /sync logic: validation, additive migrations, registry writes, warnings.
api/src/handlers.rs Adds HTTP handlers for health/sync/insert/update/search/reindex (+ chunk support).
api/src/embeddings.rs Adds Gemini batch embedding client via reqwest.
api/src/chunking.rs Adds recursive-markdown chunking + chunk table schema helpers.
api/.gitignore Updates ignores for Rust build + env files.
api/.env.example Adds example env for GEMINI key, vector dim, debug flags, LanceDB URI.
TODO.md Updates TODOs to reflect v1 /sync and env example workflow.
STATUS.md Adds v1 “current behavior” contract/decisions document.
README.md Rewrites README to reflect new architecture/workflow + endpoints/CLI.
PROJECT.md Adds detailed product/architecture philosophy doc for v1.
PLAN.md Adds comprehensive v1 build plan aligned with new contract.
AGENTS.md Updates repo-wide instructions to match Rust API and createTable.
.gitignore Expands ignores (env files, node outputs, .lancedb, etc.).
.github/workflows/ci.yml Adds CI for Node (sdk + example-app) and Rust (fmt/clippy/test).
Files not reviewed (1)
  • example-app/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +255 to +258
// Process each table independently; collect all errors for better DX.
// Important: we only write to the schema registry if all tables succeed.
for (table_name, table_spec) in request.tables {
let result = sync_one_table(state, &table_name, &table_spec).await;
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_schema defers writing the schema registry until all tables succeed, but sync_one_table already creates/migrates tables inside the per-table loop. If a later table fails validation, earlier tables may have been created/migrated without a matching registry update, leaving the database and registry inconsistent. Consider doing a validation/planning pass first (no mutations), then applying all creates/migrations + registry writes only after there are no errors, or otherwise ensuring DB mutations are rolled back consistently.

Copilot uses AI. Check for mistakes.
Comment on lines +431 to +437
// Prepare registry update but don't write yet (write only if all tables succeed).
let next_version = prev_version + 1;
let registry_update = (
table_name.to_string(),
table_spec.clone(),
resolved_embedding.clone(),
next_version,
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registry schema_version is incremented (prev_version + 1) on every sync, even when the schema is unchanged. This will grow the registry table quickly and makes the version number less meaningful. Consider only incrementing/persisting a new registry entry when the schema or resolved embedding config actually changes.

Copilot uses AI. Check for mistakes.
Comment on lines +394 to +407
let decoder = arrow_json::ReaderBuilder::new(arrow_schema.clone()).build(json_lines.as_bytes());

// Collect batches
let batches_result: Result<Vec<_>, _> = decoder.unwrap().collect();

if let Err(e) = batches_result {
return api_error(
StatusCode::INTERNAL_SERVER_ERROR,
"json_to_arrow_failed",
format!("JSON to Arrow conversion failed: {}", e),
);
}
let batches = batches_result.unwrap();

Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder is a Result from arrow_json::ReaderBuilder::build(...), but it is immediately unwrap()'d. If Arrow JSON decoding fails (e.g. schema mismatch, invalid NDJSON), the server will panic instead of returning a structured error. Handle the Err case from build(...) and return an API error, and avoid the later batches_result.unwrap() pattern as well (use a single match/?).

Suggested change
let decoder = arrow_json::ReaderBuilder::new(arrow_schema.clone()).build(json_lines.as_bytes());
// Collect batches
let batches_result: Result<Vec<_>, _> = decoder.unwrap().collect();
if let Err(e) = batches_result {
return api_error(
StatusCode::INTERNAL_SERVER_ERROR,
"json_to_arrow_failed",
format!("JSON to Arrow conversion failed: {}", e),
);
}
let batches = batches_result.unwrap();
// Build an Arrow JSON decoder and collect all RecordBatches, returning a structured
// API error if decoding or collection fails instead of panicking.
let batches = match arrow_json::ReaderBuilder::new(arrow_schema.clone())
.build(json_lines.as_bytes())
.and_then(|decoder| decoder.collect())
{
Ok(batches) => batches,
Err(e) => {
return api_error(
StatusCode::INTERNAL_SERVER_ERROR,
"json_to_arrow_failed",
format!("JSON to Arrow conversion failed: {}", e),
);
}
};

Copilot uses AI. Check for mistakes.
Comment on lines 154 to +162
get: (_target, tableNameProp) => {
const tableName = String(tableNameProp);
return {
insert: async (data: Infer<DB[keyof DB]> | Infer<DB[keyof DB]>[]) => {
const records = Array.isArray(data) ? data : [data];

// Help developers catch typos early.
if (!tableNames.has(tableName)) {
throw new Error(
`Unknown table "${tableName}". Did you forget to include it in createClient({ schema: ... })?`,
);
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createClient's Proxy get trap stringifies every property key and throws if it's not a known table. In Node, proxies are commonly accessed with symbol keys (e.g. util.inspect.custom, Symbol.toStringTag), which will be converted to strings like "Symbol(...)" and cause unexpected throws during logging/inspection. Handle typeof tableNameProp === "symbol" (and other non-table keys like "then") by returning undefined or a safe fallback instead of throwing.

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +61
* Stable JSON representation used by `vexi sync`.
*/
toJSON(): TableJson {
return {
version: 1,
columns: this.columns,
};
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table#toJSON() currently returns { version, columns: this.columns } where columns values are Field instances. That means calling table.toJSON() directly does not yield a plain JSON-serializable object, despite the docstring/type claiming a stable JSON shape. Consider converting each Field to its JSON form inside toJSON() (and updating TableJson.columns to the serialized shape) so the returned value is actually JSON data, not class instances.

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +122
function isVexiTable(value: unknown): value is { toJSON(): unknown } {
if (typeof value !== "object" || value === null) {
return false;
}
return (
"isVexiTable" in value &&
(value as { isVexiTable: unknown }).isVexiTable === true
);
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isVexiTable's type predicate claims the value has a toJSON() method, but the runtime check only verifies isVexiTable === true. This can misclassify arbitrary objects as tables and then produce invalid sync payloads. Also validate that toJSON exists and is a function (or explicitly check value instanceof Table if you want a stricter guard).

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +32
// If it already exists, this will return an error; we ignore it.
let _ = db
.create_empty_table(REGISTRY_TABLE_NAME, schema)
.execute()
.await;
Ok(())
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure_schema_registry currently swallows all errors from create_empty_table and always returns Ok(()). This can hide real initialization failures (e.g. I/O errors) and make subsequent registry operations fail later with less actionable errors. Consider returning the error unless it specifically indicates the table already exists.

Suggested change
// If it already exists, this will return an error; we ignore it.
let _ = db
.create_empty_table(REGISTRY_TABLE_NAME, schema)
.execute()
.await;
Ok(())
// If it already exists, this will return an error; we ignore that specific case,
// but propagate other errors so initialization failures are not silently swallowed.
match db
.create_empty_table(REGISTRY_TABLE_NAME, schema)
.execute()
.await
{
Ok(_) => Ok(()),
Err(e) => {
let msg = e.to_string();
if msg.contains("already exists") {
Ok(())
} else {
Err(msg)
}
}
}

Copilot uses AI. Check for mistakes.
@marcoshernanz marcoshernanz merged commit 7154365 into main Feb 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants