Make Vexi production-ready by marcoshernanz · Pull Request #2 · marcoshernanz/vexi

marcoshernanz · 2026-02-13T20:18:07Z

This pull request introduces foundational improvements to the Vexi project, focusing on documentation, CI/CD automation, and clarifying the development workflow. The most important changes are the addition of a comprehensive project overview, a status/decisions document, and a CI workflow for both Node and Rust components. Several minor updates clarify API usage, project structure, and environment configuration.

Project documentation and philosophy:

Added PROJECT.md with a detailed overview of Vexi's goals, schema API, developer workflow, backend responsibilities, and SDK/client usage. This document explains the end-to-end type safety, automatic embeddings, migration philosophy, and the intended developer experience.
Added STATUS.md to track v1 status, non-negotiable decisions, completed features, and next steps. This document clarifies the current API contract, migration policy, embedding provider, and search/chunking strategies.

CI/CD and environment configuration:

Added .github/workflows/ci.yml for automated CI, running lint/build/typecheck for Node SDK/example-app and formatting/lint/tests for Rust API.
Added api/.env.example and updated .gitignore to ignore .env and Rust build artifacts, clarifying environment variables required for embeddings, LanceDB, and debug endpoints. [1] [2]

API and SDK usage clarification:

Updated README.md to reflect the new project philosophy, quickstart instructions, schema/client usage, CLI commands, API endpoints, and limitations. The README now includes a mermaid diagram and emphasizes the schema registry contract.
Updated AGENTS.md, TODO.md, and removed obsolete instructions in api/AGENTS.md to clarify the API server (now Rust/Axum), validation approach, and migration endpoint. [1] [2] [3] [4]

Project structure and conventions:

Updated documentation to reflect the new backend stack (Rust/Axum, LanceDB), the canonical schema API (createTable), and clarified the removal of legacy endpoints and migration strategies. [1] [2] [3]

Summary:
These changes establish a clear foundation for Vexi's development and usage, automate CI for reliability, and remove legacy ambiguity in documentation and project structure.

…ation functionality

… error handling and NDJSON support

…e type safety in client operations

…mproved error handling

… support

…D generation; remove unused embedding config logic

…ed documentation and remove OpenAI references

…update related API and SDK logic

…embedding recomputation logic

…lement chunk table creation and embedding handling

…oint and related request/response models

docs: update PROJECT.md to reflect embedding provider change to Gemini only docs: revise README.md for clarity on Vexi's functionality and usage chore: update STATUS.md with next steps for documentation cleanup chore: update TODO.md to include .env.example file addition feat: add api/.env.example for environment variable configuration fix: update example-app to use new client import and improve logging chore: update example-app package-lock.json for dependency versions chore: update example-app package.json to include reindex script fix: refine schema.ts in example-app for embedding configuration

…nd testing steps

Copilot

Pull request overview

This PR makes Vexi “v1/production-ready” by formalizing the schema contract and workflow end-to-end: a runtime Table wrapper + stable-ish schema serialization in the SDK/CLI, a new Rust/Axum API with a schema registry and additive migrations, plus documentation and CI to codify the intended developer experience.

Changes:

SDK: introduce createTable/Table wrapper, richer inferred types (Row, InsertInput, etc.), and expand the client to support insert (returning rows), update, and search.
CLI: switch to v1-only one-shot POST /sync, improve table detection and output modes, add reindex.
API: replace the legacy TS/Fastify server with a Rust/Axum server implementing /sync, insert/update/search/reindex, schema registry persistence, and Gemini embedding integration; add env/CI/docs updates.

Reviewed changes

Copilot reviewed 35 out of 39 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
sdk/src/schema.ts	Adds `Table` wrapper + v1 table types (`Row`, `InsertInput`, etc.) and `createTable`.
sdk/src/index.ts	Re-exports new schema APIs/types from the SDK entrypoint.
sdk/src/client.ts	Updates typed client to support insert/update/search + better error handling.
sdk/src/cli.ts	Reworks CLI sync to v1 one-shot `/sync`, adds reindex command and improved output.
sdk/AGENTS.md	Updates SDK conventions (notably “type-only, no interface”) and `createTable` docs.
example-app/schema.ts	Updates example schema to consumer-style imports and `createTable`.
example-app/main.ts	Updates example usage to new insert/update/search behavior and output.
example-app/package.json	Uses local `vexi` dependency and adds `reindex` script; pins tool versions.
example-app/package-lock.json	Locks updated deps including local `vexi` link.
api/tsconfig.json	Removes legacy TS API config (API is now Rust).
api/tsconfig.build.json	Removes legacy TS build config.
api/src/validator.ts	Removes Zod validator from legacy TS API.
api/src/schema.ts	Removes Arrow schema conversion from legacy TS API.
api/src/index.ts	Removes legacy Fastify server entrypoint.
api/scripts/inspect.ts	Removes legacy TS inspection script.
api/prettier.config.mts	Removes legacy formatter config for TS API.
api/eslint.config.mts	Removes legacy ESLint config for TS API.
api/package.json	Removes Node package manifest for legacy TS API.
api/AGENTS.md	Removes legacy TS API agent instructions.
api/Cargo.toml	Adds Rust crate manifest + dependencies for Axum/LanceDB/Arrow/etc.
api/src/main.rs	Adds Axum server bootstrap, routing, env config.
api/src/models.rs	Adds Rust request/response + schema/registry model types.
api/src/db.rs	Adds schema registry persistence helpers (LanceDB table).
api/src/sync.rs	Adds `/sync` logic: validation, additive migrations, registry writes, warnings.
api/src/handlers.rs	Adds HTTP handlers for health/sync/insert/update/search/reindex (+ chunk support).
api/src/embeddings.rs	Adds Gemini batch embedding client via reqwest.
api/src/chunking.rs	Adds recursive-markdown chunking + chunk table schema helpers.
api/.gitignore	Updates ignores for Rust build + env files.
api/.env.example	Adds example env for GEMINI key, vector dim, debug flags, LanceDB URI.
TODO.md	Updates TODOs to reflect v1 `/sync` and env example workflow.
STATUS.md	Adds v1 “current behavior” contract/decisions document.
README.md	Rewrites README to reflect new architecture/workflow + endpoints/CLI.
PROJECT.md	Adds detailed product/architecture philosophy doc for v1.
PLAN.md	Adds comprehensive v1 build plan aligned with new contract.
AGENTS.md	Updates repo-wide instructions to match Rust API and `createTable`.
.gitignore	Expands ignores (env files, node outputs, .lancedb, etc.).
.github/workflows/ci.yml	Adds CI for Node (sdk + example-app) and Rust (fmt/clippy/test).

Files not reviewed (1)

example-app/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T20:28:28Z

api/src/sync.rs

+    // Process each table independently; collect all errors for better DX.
+    // Important: we only write to the schema registry if all tables succeed.
+    for (table_name, table_spec) in request.tables {
+        let result = sync_one_table(state, &table_name, &table_spec).await;


sync_schema defers writing the schema registry until all tables succeed, but sync_one_table already creates/migrates tables inside the per-table loop. If a later table fails validation, earlier tables may have been created/migrated without a matching registry update, leaving the database and registry inconsistent. Consider doing a validation/planning pass first (no mutations), then applying all creates/migrations + registry writes only after there are no errors, or otherwise ensuring DB mutations are rolled back consistently.

Copilot · 2026-02-13T20:28:28Z

api/src/sync.rs

+    // Prepare registry update but don't write yet (write only if all tables succeed).
+    let next_version = prev_version + 1;
+    let registry_update = (
+        table_name.to_string(),
+        table_spec.clone(),
+        resolved_embedding.clone(),
+        next_version,


The registry schema_version is incremented (prev_version + 1) on every sync, even when the schema is unchanged. This will grow the registry table quickly and makes the version number less meaningful. Consider only incrementing/persisting a new registry entry when the schema or resolved embedding config actually changes.

Copilot · 2026-02-13T20:28:28Z

api/src/handlers.rs

+    let decoder = arrow_json::ReaderBuilder::new(arrow_schema.clone()).build(json_lines.as_bytes());
+
+    // Collect batches
+    let batches_result: Result<Vec<_>, _> = decoder.unwrap().collect();
+
+    if let Err(e) = batches_result {
+        return api_error(
+            StatusCode::INTERNAL_SERVER_ERROR,
+            "json_to_arrow_failed",
+            format!("JSON to Arrow conversion failed: {}", e),
+        );
+    }
+    let batches = batches_result.unwrap();
+


decoder is a Result from arrow_json::ReaderBuilder::build(...), but it is immediately unwrap()'d. If Arrow JSON decoding fails (e.g. schema mismatch, invalid NDJSON), the server will panic instead of returning a structured error. Handle the Err case from build(...) and return an API error, and avoid the later batches_result.unwrap() pattern as well (use a single match/?).

Suggested change

let decoder = arrow_json::ReaderBuilder::new(arrow_schema.clone()).build(json_lines.as_bytes());

// Collect batches

let batches_result: Result<Vec<_>, _> = decoder.unwrap().collect();

if let Err(e) = batches_result {

return api_error(

StatusCode::INTERNAL_SERVER_ERROR,

"json_to_arrow_failed",

format!("JSON to Arrow conversion failed: {}", e),

);

}

let batches = batches_result.unwrap();

// Build an Arrow JSON decoder and collect all RecordBatches, returning a structured

// API error if decoding or collection fails instead of panicking.

let batches = match arrow_json::ReaderBuilder::new(arrow_schema.clone())

.build(json_lines.as_bytes())

.and_then(|decoder| decoder.collect())

{

Ok(batches) => batches,

Err(e) => {

return api_error(

StatusCode::INTERNAL_SERVER_ERROR,

"json_to_arrow_failed",

format!("JSON to Arrow conversion failed: {}", e),

);

}

};

Copilot · 2026-02-13T20:28:29Z

sdk/src/client.ts

      get: (_target, tableNameProp) => {
        const tableName = String(tableNameProp);
-        return {
-          insert: async (data: Infer<DB[keyof DB]> | Infer<DB[keyof DB]>[]) => {
-            const records = Array.isArray(data) ? data : [data];
+
+        // Help developers catch typos early.
+        if (!tableNames.has(tableName)) {
+          throw new Error(
+            `Unknown table "${tableName}". Did you forget to include it in createClient({ schema: ... })?`,
+          );
+        }


createClient's Proxy get trap stringifies every property key and throws if it's not a known table. In Node, proxies are commonly accessed with symbol keys (e.g. util.inspect.custom, Symbol.toStringTag), which will be converted to strings like "Symbol(...)" and cause unexpected throws during logging/inspection. Handle typeof tableNameProp === "symbol" (and other non-table keys like "then") by returning undefined or a safe fallback instead of throwing.

Copilot · 2026-02-13T20:28:29Z

sdk/src/schema.ts

+   * Stable JSON representation used by `vexi sync`.
+   */
+  toJSON(): TableJson {
+    return {
+      version: 1,
+      columns: this.columns,
+    };


Table#toJSON() currently returns { version, columns: this.columns } where columns values are Field instances. That means calling table.toJSON() directly does not yield a plain JSON-serializable object, despite the docstring/type claiming a stable JSON shape. Consider converting each Field to its JSON form inside toJSON() (and updating TableJson.columns to the serialized shape) so the returned value is actually JSON data, not class instances.

Copilot · 2026-02-13T20:28:29Z

sdk/src/cli.ts

+function isVexiTable(value: unknown): value is { toJSON(): unknown } {
+  if (typeof value !== "object" || value === null) {
+    return false;
  }
+  return (
+    "isVexiTable" in value &&
+    (value as { isVexiTable: unknown }).isVexiTable === true
+  );
+}


isVexiTable's type predicate claims the value has a toJSON() method, but the runtime check only verifies isVexiTable === true. This can misclassify arbitrary objects as tables and then produce invalid sync payloads. Also validate that toJSON exists and is a function (or explicitly check value instanceof Table if you want a stricter guard).

Copilot · 2026-02-13T20:28:30Z

api/src/db.rs

+    // If it already exists, this will return an error; we ignore it.
+    let _ = db
+        .create_empty_table(REGISTRY_TABLE_NAME, schema)
+        .execute()
+        .await;
+    Ok(())


ensure_schema_registry currently swallows all errors from create_empty_table and always returns Ok(()). This can hide real initialization failures (e.g. I/O errors) and make subsequent registry operations fail later with less actionable errors. Consider returning the error unless it specifically indicates the table already exists.

Suggested change

// If it already exists, this will return an error; we ignore it.

let _ = db

.create_empty_table(REGISTRY_TABLE_NAME, schema)

.execute()

.await;

Ok(())

// If it already exists, this will return an error; we ignore that specific case,

// but propagate other errors so initialization failures are not silently swallowed.

match db

.create_empty_table(REGISTRY_TABLE_NAME, schema)

.execute()

.await

{

Ok(_) => Ok(()),

Err(e) => {

let msg = e.to_string();

if msg.contains("already exists") {

Ok(())

} else {

Err(msg)

}

}

}

marcoshernanz added 17 commits January 20, 2026 00:43

feat: migrate node.js api to rust

e582446

feat: implement embedding configuration retrieval and embedding gener…

b1f7f9f

…ation functionality

feat: add initial project documentation for Vexi database

5a5e496

feat: add comprehensive build plan for Vexi v1 development

5b838a7

feat: enhance API health check and insert functionality with improved…

0803967

… error handling and NDJSON support

feat: update SDK to use createTable for schema definitions and enhanc…

dc9f8c2

…e type safety in client operations

feat: enhance CLI sync command with fallback to legacy endpoint and i…

05b029e

…mproved error handling

feat: implement schema registry and sync functionality with embedding…

f8a3667

… support

feat: refactor insert functionality to use new registry entry and UUI…

0d66bf1

…D generation; remove unused embedding config logic

feat: migrate embedding functionality to use Gemini API; update relat…

a4ce992

…ed documentation and remove OpenAI references

feat: implement vector search functionality using Gemini embeddings; …

d16ca81

…update related API and SDK logic

feat: implement PATCH endpoint for updating rows; add validation and …

afe06ec

…embedding recomputation logic

feat: add chunking functionality for recursive markdown strategy; imp…

21aeb66

…lement chunk table creation and embedding handling

feat: add reindexing functionality for tables; implement reindex endp…

5fd1551

…oint and related request/response models

feat: standardize api and sdk

f3691cc

feat: add CI workflow for Node and Rust; include linting, building, a…

5419c78

…nd testing steps

Copilot AI review requested due to automatic review settings February 13, 2026 20:18

Copilot started reviewing on behalf of marcoshernanz February 13, 2026 20:18 View session

ci: install protoc for rust build

49febec

Copilot AI reviewed Feb 13, 2026

View reviewed changes

marcoshernanz merged commit 7154365 into main Feb 13, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Vexi production-ready#2

Make Vexi production-ready#2
marcoshernanz merged 18 commits intomainfrom
embedding-queue

marcoshernanz commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Copilot AI Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    // If it already exists, this will return an error; we ignore it.
-    let _ = db
-        .create_empty_table(REGISTRY_TABLE_NAME, schema)
-        .execute()
-        .await;
-    Ok(())
+    // If it already exists, this will return an error; we ignore that specific case,
+    // but propagate other errors so initialization failures are not silently swallowed.
+    match db
+        .create_empty_table(REGISTRY_TABLE_NAME, schema)
+        .execute()
+        .await
+    {
+        Ok(_) => Ok(()),
+        Err(e) => {
+            let msg = e.to_string();
+            if msg.contains("already exists") {
+                Ok(())
+            } else {
+                Err(msg)
+            }
+        }
+    }

Conversation

marcoshernanz commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants