This document is the entrypoint for Velaria's supported Python ecosystem layer.
Python is a supported ingress, interop, and packaging surface. It is not the execution core. Core semantics still come from the native kernel and the runtime contract in docs/runtime-contract.md.
The supported Python ecosystem includes:
- the
velaria/package andSessionAPI - Arrow ingestion and Arrow output
uv-based local workflow- native extension build
- wheel / native wheel packaging
- the supported CLI entrypoint
velaria_cli.py - Excel ingestion via
read_excel(...) - Bitable adapters and stream source integration
- custom source / custom sink adapters
- vector search and vector explain APIs
Examples and helper assets include:
examples/demo_batch_sql_arrow.pyexamples/demo_stream_sql.pyexamples/demo_bitable_group_by_owner.pyexamples/demo_vector_search.pybenchmarks/bench_arrow_ingestion.py- local ecosystem scripts and skills
The Python experimental area is currently reserved under experimental/.
Anything placed there is explicitly outside the supported ecosystem surface until it is promoted into velaria/, velaria_cli.py, or a supported adapter module.
Python does not define:
- execution hot-path semantics
- a separate progress schema
- a separate checkpoint contract
- a separate vector scoring implementation for supported APIs
- Python UDFs in the hot path
Main Session API:
Session.read_csv(...)Session.sql(...)Session.create_dataframe_from_arrow(...)Session.create_stream_from_arrow(...)Session.create_temp_view(...)Session.read_stream_csv_dir(...)Session.stream_sql(...)Session.explain_stream_sql(...)Session.start_stream_sql(...)Session.vector_search(...)Session.explain_vector_search(...)
Additional ecosystem helpers:
read_excel(...)CustomArrowStreamSourceCustomArrowStreamSinkcreate_stream_from_custom_source(...)consume_arrow_batches_with_custom_sink(...)
Mapping rule:
- Python names may be ecosystem-friendly
- behavior must map back to the same native kernel contract exposed by C++
Stable Python layout in this repo:
- supported library:
python_api/velaria/
- supported CLI tool:
python_api/velaria_cli.py
- examples:
python_api/examples/
- benchmarks:
python_api/benchmarks/
- reserved experimental area:
python_api/experimental/
- regression tests:
python_api/tests/
Repository Python commands use uv.
Recommended local baseline:
- CPython
3.12 uv- local CPython headers (
Python.h)
Bazel Python detection currently probes local CPython interpreters in the 3.9 to 3.13 range. If auto-discovery fails, set:
export VELARIA_PYTHON_BIN=/path/to/python3.12That interpreter must expose Python.h; otherwise Bazel cannot build the native extension.
Bootstrap:
bazel build //:velaria_pyext
bazel run //python_api:sync_native_extension
uv sync --project python_api --python python3.12If you run python_api/velaria_cli.py or other source-checkout Python entrypoints directly,
keep python_api/velaria/_velaria.so in sync with:
bazel run //python_api:sync_native_extensionRun demos:
uv run --project python_api python python_api/examples/demo_batch_sql_arrow.py
uv run --project python_api python python_api/examples/demo_stream_sql.py
uv run --project python_api python python_api/examples/demo_vector_search.pyRecommended regression entrypoint:
./scripts/run_python_ecosystem_regression.shThat script covers:
- native extension build
- wheel and native wheel build
- Bazel Python regression targets
- demo smoke
- CLI smoke
Build targets:
- native extension:
//:velaria_pyext
- sync built native extension into the source checkout:
//python_api:sync_native_extension
- pure-Python wheel wrapper:
//python_api:velaria_whl
- native wheel:
//python_api:velaria_native_whl
- Python CLI:
//python_api:velaria_cli
Single-file CLI packaging:
./scripts/build_py_cli_executable.sh
./dist/velaria-cli csv-sql \
--csv /path/to/input.csv \
--query "SELECT * FROM input_table LIMIT 5"The CLI is part of the ecosystem layer. For supported paths, it should delegate to the same native session contract as Python and C++.
Repo-visible CLI entrypoints are:
- source checkout:
uv run --project python_api python python_api/velaria_cli.py ...
- packaged binary:
./dist/velaria-cli ...
Do not assume a global velaria-cli command exists unless you have separately installed and exposed one in your environment.
The CLI also supports a local workspace layout for tracked runs and artifact indexing.
Default paths:
- runs:
~/.velaria/runs/<run_id>/ - index:
~/.velaria/index/artifacts.sqlite
You can override the root with:
export VELARIA_HOME=/tmp/velaria-homeTracked run commands:
uv run --project python_api python python_api/velaria_cli.py run start -- csv-sql \
--csv /path/to/input.csv \
--query "SELECT * FROM input_table LIMIT 5"
./dist/velaria-cli run start -- csv-sql \
--csv /path/to/input.csv \
--query "SELECT * FROM input_table LIMIT 5"
uv run --project python_api python python_api/velaria_cli.py run show --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py run status --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py artifacts list --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py artifacts preview --artifact-id <artifact_id>
uv run --project python_api python python_api/velaria_cli.py run cleanup --keep-last 10The tracked workspace contract is:
- stdout returns JSON only
- logs go to
stdout.log/stderr.log - stream progress appends native
snapshotJson()output toprogress.jsonl - stream explain keeps the native
logical/physical/strategystructure - large results stay in files under
artifacts/; SQLite stores only index rows and small previews - deleting run directories requires the explicit
--delete-filesswitch
End-to-end examples:
CSV SQL to parquet plus preview:
uv run --project python_api python python_api/velaria_cli.py run start -- csv-sql \
--csv /path/to/input.csv \
--query "SELECT name, score FROM input_table WHERE score > 10"
uv run --project python_api python python_api/velaria_cli.py artifacts list --run-id <run_id>
uv run --project python_api python python_api/velaria_cli.py artifacts preview --artifact-id <artifact_id>Stream SQL once plus status:
uv run --project python_api python python_api/velaria_cli.py run start -- stream-sql-once \
--source-csv-dir /path/to/source_dir \
--sink-schema "key STRING, value_sum INT" \
--query "INSERT INTO output_sink SELECT key, SUM(value) AS value_sum FROM input_stream GROUP BY key"
uv run --project python_api python python_api/velaria_cli.py run status --run-id <run_id>Vector search plus explain artifact:
uv run --project python_api python python_api/velaria_cli.py run start -- vector-search \
--csv /path/to/vectors.csv \
--vector-column embedding \
--query-vector "0.1,0.2,0.3" \
--top-k 5
uv run --project python_api python python_api/velaria_cli.py artifacts list --run-id <run_id>Python ecosystem source groups:
- supported:
//python_api:velaria_python_supported_sources
- examples and benchmarks:
//python_api:velaria_python_example_sources
- experimental placeholder:
//python_api:velaria_python_experimental_sources
Supported Arrow ingestion inputs:
pyarrow.Tablepyarrow.RecordBatchpyarrow.RecordBatchReader- objects implementing
__arrow_c_stream__ - Python sequences of Arrow batches
Vector-preferred Arrow shape:
FixedSizeList<float32>
Preferred local CSV vector text shape:
[1 2 3][1,2,3]
Current vector search scope:
- local exact scan only
- metrics:
cosine,dot,l2 - no ANN / distributed execution / standalone vector DB behavior
read_excel(...) reads .xlsx through:
pandas.read_excelpyarrow.TableconversionSession.create_dataframe_from_arrow(...)
Example:
from velaria import Session, read_excel
session = Session()
df = read_excel(session, "/path/to/file.xlsx", sheet_name="Sheet1")
session.create_temp_view("staff", df)
print(session.sql("SELECT * FROM staff LIMIT 5").to_rows())Supported ecosystem integrations include:
- Bitable-backed stream source flows
- custom Arrow stream sources
- custom Arrow stream sinks
These are supported as ecosystem integrations, not as alternate execution cores.
Python ecosystem regression targets:
//python_api:streaming_v05_test//python_api:arrow_stream_ingestion_test//python_api:vector_search_test//python_api:read_excel_test//python_api:custom_stream_source_test//python_api:bitable_stream_source_test//python_api:bitable_group_by_owner_integration_test
Python-layer grouped suite:
//python_api:velaria_python_supported_regression
Root-level grouped suite:
//:python_ecosystem_regression
Python may:
- wrap
- package
- automate
- project ecosystem-friendly names
Python may not:
- redefine progress/checkpoint/explain semantics
- become the source of truth for runtime decisions
- introduce a second vector-search implementation for supported interfaces
For core boundaries, see docs/core-boundary.md. For stable runtime semantics, see docs/runtime-contract.md.