Velaria Python Ecosystem

This document is the entrypoint for Velaria's supported Python ecosystem layer.

Python is a supported ingress, interop, and packaging surface. It is not the execution core. Core semantics still come from the native kernel and the runtime contract in docs/runtime-contract.md.

Scope

Supported

The supported Python ecosystem includes:

the velaria/ package and Session API
Arrow ingestion and Arrow output
uv-based local workflow
native extension build
wheel / native wheel packaging
the supported CLI entrypoint velaria_cli.py
Excel ingestion via read_excel(...)
Bitable adapters and stream source integration
custom source / custom sink adapters
vector search and vector explain APIs

Examples

Examples and helper assets include:

examples/demo_batch_sql_arrow.py
examples/demo_stream_sql.py
examples/demo_bitable_group_by_owner.py
examples/demo_vector_search.py
benchmarks/bench_arrow_ingestion.py
local ecosystem scripts and skills

Experimental

The Python experimental area is currently reserved under experimental/.

Anything placed there is explicitly outside the supported ecosystem surface until it is promoted into velaria/, velaria_cli.py, or a supported adapter module.

Not In Scope

Python does not define:

execution hot-path semantics
a separate progress schema
a separate checkpoint contract
a separate vector scoring implementation for supported APIs
Python UDFs in the hot path

API Surface

Main Session API:

Session.read_csv(...)
Session.sql(...)
Session.create_dataframe_from_arrow(...)
Session.create_stream_from_arrow(...)
Session.create_temp_view(...)
Session.read_stream_csv_dir(...)
Session.stream_sql(...)
Session.explain_stream_sql(...)
Session.start_stream_sql(...)
Session.vector_search(...)
Session.explain_vector_search(...)

Additional ecosystem helpers:

read_excel(...)
CustomArrowStreamSource
CustomArrowStreamSink
create_stream_from_custom_source(...)
consume_arrow_batches_with_custom_sink(...)

Mapping rule:

Python names may be ecosystem-friendly
behavior must map back to the same native kernel contract exposed by C++

Repository Layout

Stable Python layout in this repo:

supported library:
- python_api/velaria/
supported CLI tool:
- python_api/velaria_cli.py
examples:
- python_api/examples/
benchmarks:
- python_api/benchmarks/
reserved experimental area:
- python_api/experimental/
regression tests:
- python_api/tests/

Toolchain and Environment

Repository Python commands use uv.

Recommended local baseline:

CPython 3.12
uv
local CPython headers (Python.h)

Bazel Python detection currently probes local CPython interpreters in the 3.9 to 3.13 range. If auto-discovery fails, set:

export VELARIA_PYTHON_BIN=/path/to/python3.12

That interpreter must expose Python.h; otherwise Bazel cannot build the native extension.

Development Workflow

Bootstrap:

bazel build //:velaria_pyext
bazel run //python_api:sync_native_extension
uv sync --project python_api --python python3.12

If you run python_api/velaria_cli.py or other source-checkout Python entrypoints directly, keep python_api/velaria/_velaria.so in sync with:

bazel run //python_api:sync_native_extension

Run demos:

uv run --project python_api python python_api/examples/demo_batch_sql_arrow.py
uv run --project python_api python python_api/examples/demo_stream_sql.py
uv run --project python_api python python_api/examples/demo_vector_search.py

Recommended regression entrypoint:

./scripts/run_python_ecosystem_regression.sh

That script covers:

native extension build
wheel and native wheel build
Bazel Python regression targets
demo smoke
CLI smoke

Packaging

Build targets:

native extension:
- //:velaria_pyext
sync built native extension into the source checkout:
- //python_api:sync_native_extension
pure-Python wheel wrapper:
- //python_api:velaria_whl
native wheel:
- //python_api:velaria_native_whl
Python CLI:
- //python_api:velaria_cli

Single-file CLI packaging:

./scripts/build_py_cli_executable.sh
./dist/velaria-cli csv-sql \
  --csv /path/to/input.csv \
  --query "SELECT * FROM input_table LIMIT 5"

The CLI is part of the ecosystem layer. For supported paths, it should delegate to the same native session contract as Python and C++.

Repo-visible CLI entrypoints are:

source checkout:
- uv run --project python_api python python_api/velaria_cli.py ...
packaged binary:
- ./dist/velaria-cli ...

Do not assume a global velaria-cli command exists unless you have separately installed and exposed one in your environment.

Workspace + Artifacts

The CLI also supports a local workspace layout for tracked runs and artifact indexing.

Default paths:

runs: ~/.velaria/runs/<run_id>/
index: ~/.velaria/index/artifacts.sqlite

You can override the root with:

export VELARIA_HOME=/tmp/velaria-home

Tracked run commands:

uv run --project python_api python python_api/velaria_cli.py run start -- csv-sql \
  --csv /path/to/input.csv \
  --query "SELECT * FROM input_table LIMIT 5"

./dist/velaria-cli run start -- csv-sql \
  --csv /path/to/input.csv \
  --query "SELECT * FROM input_table LIMIT 5"

uv run --project python_api python python_api/velaria_cli.py run show --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py run status --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py artifacts list --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py artifacts preview --artifact-id <artifact_id>
uv run --project python_api python python_api/velaria_cli.py run cleanup --keep-last 10

The tracked workspace contract is:

stdout returns JSON only
logs go to stdout.log / stderr.log
stream progress appends native snapshotJson() output to progress.jsonl
stream explain keeps the native logical / physical / strategy structure
large results stay in files under artifacts/; SQLite stores only index rows and small previews
deleting run directories requires the explicit --delete-files switch

End-to-end examples:

CSV SQL to parquet plus preview:

uv run --project python_api python python_api/velaria_cli.py run start -- csv-sql \
  --csv /path/to/input.csv \
  --query "SELECT name, score FROM input_table WHERE score > 10"

uv run --project python_api python python_api/velaria_cli.py artifacts list --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py artifacts preview --artifact-id <artifact_id>

Stream SQL once plus status:

uv run --project python_api python python_api/velaria_cli.py run start -- stream-sql-once \
  --source-csv-dir /path/to/source_dir \
  --sink-schema "key STRING, value_sum INT" \
  --query "INSERT INTO output_sink SELECT key, SUM(value) AS value_sum FROM input_stream GROUP BY key"

uv run --project python_api python python_api/velaria_cli.py run status --run-id <run_id>

Vector search plus explain artifact:

uv run --project python_api python python_api/velaria_cli.py run start -- vector-search \
  --csv /path/to/vectors.csv \
  --vector-column embedding \
  --query-vector "0.1,0.2,0.3" \
  --top-k 5

uv run --project python_api python python_api/velaria_cli.py artifacts list --run-id <run_id>

Python ecosystem source groups:

supported:
- //python_api:velaria_python_supported_sources
examples and benchmarks:
- //python_api:velaria_python_example_sources
experimental placeholder:
- //python_api:velaria_python_experimental_sources

Arrow Contract

Supported Arrow ingestion inputs:

pyarrow.Table
pyarrow.RecordBatch
pyarrow.RecordBatchReader
objects implementing __arrow_c_stream__
Python sequences of Arrow batches

Vector-preferred Arrow shape:

FixedSizeList<float32>

Preferred local CSV vector text shape:

[1 2 3]
[1,2,3]

Current vector search scope:

local exact scan only
metrics: cosine, dot, l2
no ANN / distributed execution / standalone vector DB behavior

Excel, Bitable, and Custom Streams

Excel

read_excel(...) reads .xlsx through:

pandas.read_excel
pyarrow.Table conversion
Session.create_dataframe_from_arrow(...)

Example:

from velaria import Session, read_excel

session = Session()
df = read_excel(session, "/path/to/file.xlsx", sheet_name="Sheet1")
session.create_temp_view("staff", df)
print(session.sql("SELECT * FROM staff LIMIT 5").to_rows())

Bitable and Custom Streams

Supported ecosystem integrations include:

Bitable-backed stream source flows
custom Arrow stream sources
custom Arrow stream sinks

These are supported as ecosystem integrations, not as alternate execution cores.

Regression Matrix

Python ecosystem regression targets:

//python_api:streaming_v05_test
//python_api:arrow_stream_ingestion_test
//python_api:vector_search_test
//python_api:read_excel_test
//python_api:custom_stream_source_test
//python_api:bitable_stream_source_test
//python_api:bitable_group_by_owner_integration_test

Python-layer grouped suite:

//python_api:velaria_python_supported_regression

Root-level grouped suite:

//:python_ecosystem_regression

Relation to Core

Python may:

wrap
package
automate
project ecosystem-friendly names

Python may not:

redefine progress/checkpoint/explain semantics
become the source of truth for runtime decisions
introduce a second vector-search implementation for supported interfaces

For core boundaries, see docs/core-boundary.md. For stable runtime semantics, see docs/runtime-contract.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velaria Python Ecosystem

Scope

Supported

Examples

Experimental

Not In Scope

API Surface

Repository Layout

Toolchain and Environment

Development Workflow

Packaging

Workspace + Artifacts

Arrow Contract

Excel, Bitable, and Custom Streams

Excel

Bitable and Custom Streams

Regression Matrix

Relation to Core

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Velaria Python Ecosystem

Scope

Supported

Examples

Experimental

Not In Scope

API Surface

Repository Layout

Toolchain and Environment

Development Workflow

Packaging

Workspace + Artifacts

Arrow Contract

Excel, Bitable, and Custom Streams

Excel

Bitable and Custom Streams

Regression Matrix

Relation to Core