SPLAI is a distributed runtime for AI workloads. It turns requests into task graphs, schedules them across workers, and returns traceable results.
- Run AI jobs across many CPU workers (GPU optional)
- Keep execution on your own infrastructure (local, VM, Kubernetes, edge)
- Route and govern model/tool execution with policy controls
- Integrate with existing AI apps via native REST or OpenAI-compatible endpoints
This quickstart is written for first-time users and takes you from zero to a running job.
Open terminal A:
cd /Users/mchenetz/git/SPLAI
go run ./cmd/api-gatewayOpen terminal B:
cd /Users/mchenetz/git/SPLAI
go run ./worker/cmd/worker-agentOpen terminal C:
curl -s -X POST http://localhost:8080/v1/jobs \
-H 'content-type: application/json' \
-d '{
"type":"chat",
"input":"Analyze 500 support tickets and produce root causes.",
"policy":"enterprise-default",
"priority":"interactive"
}' | jqExpected output pattern:
{
"job_id": "job-1"
}curl -s http://localhost:8080/v1/jobs/job-1 | jq
curl -s http://localhost:8080/v1/jobs/job-1/tasks | jqcurl -s "http://localhost:8080/v1/jobs/job-1/tasks?status=Completed" | jqcurl -N http://localhost:8080/v1/jobs/job-1/streamEvents include:
job.snapshotjob.statustask.update- terminal event such as
job.completed/job.failed
cat /tmp/splai-artifacts/job-1/t1-split/output.json | jqFor guided docs written in instructional style, use:
/Users/mchenetz/git/SPLAI/docs/reference/quickstart.md/Users/mchenetz/git/SPLAI/docs/reference/user-guide.md/Users/mchenetz/git/SPLAI/docs/reference/integration-guide.md
Enable compatibility mode:
SPLAI_OPENAI_COMPAT=true go run ./cmd/api-gatewaySupported endpoints:
POST /v1/chat/completionsPOST /v1/responses
Current behavior:
- Non-streaming only (
stream=trueis rejected) - Calls are translated to SPLAI jobs and waited synchronously
- Timeout controlled by
SPLAI_OPENAI_COMPAT_TIMEOUT_SECONDS(default60)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="local-dev-token", # required by SDK; SPLAI ignores unless auth is enabled
)
resp = client.chat.completions.create(
model="llama3-8b-q4",
messages=[
{"role": "system", "content": "You are a support analyst."},
{"role": "user", "content": "Summarize root causes from these 500 tickets."},
],
)
print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8080/v1",
apiKey: "local-dev-token",
});
const resp = await client.chat.completions.create({
model: "llama3-8b-q4",
messages: [{ role: "user", content: "Generate top 5 ticket themes." }],
});
console.log(resp.choices[0].message.content);Use OpenAI-compatible SPLAI endpoint as the model backend:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="llama3-8b-q4",
base_url="http://localhost:8080/v1",
api_key="local-dev-token",
)
print(llm.invoke("Classify these support incidents by root cause."))Point OpenAI LLM settings at SPLAI compatibility endpoint:
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model="llama3-8b-q4",
api_base="http://localhost:8080/v1",
api_key="local-dev-token",
)
print(llm.complete("Summarize this incident cluster."))Use an HTTP task (or PythonOperator) to submit and poll SPLAI jobs.
import requests
import time
base = "http://localhost:8080"
job = requests.post(f"{base}/v1/jobs", json={
"type": "chat",
"input": "Analyze daily support export and produce root causes",
"policy": "enterprise-default",
"priority": "batch",
}).json()
job_id = job["job_id"]
while True:
status = requests.get(f"{base}/v1/jobs/{job_id}").json()
if status["status"] in ["Completed", "Failed", "Canceled"]:
print(status)
break
time.sleep(2)- Node 1: HTTP
POST /v1/jobs - Node 2: Wait/Delay
- Node 3: HTTP
GET /v1/jobs/{id} - Node 4: Branch on status (
Completed/Failed)
This pattern gives async reliability with retries and queueing from SPLAI.
OCI chart (published by GitHub Action):
helm install splai oci://ghcr.io/mchenetz/charts/splai --version <chart-version> -n splai-system --create-namespaceThen access gateway locally:
kubectl -n splai-system port-forward svc/splai-splai-api-gateway 8080:8080The planner and scheduler deployments now run full HTTP services:
- Planner (default
:8081):GET /healthzPOST /v1/planner/compile
- Scheduler (default
:8082):GET /healthzPOST /v1/scheduler/jobsGET /v1/scheduler/jobs/{id}GET /v1/scheduler/jobs/{id}/tasksPOST /v1/scheduler/workers/registerPOST /v1/scheduler/workers/{id}/heartbeatGET /v1/scheduler/workers/{id}/assignmentsPOST /v1/scheduler/tasks/report
Build/install helper binaries:
make install-workerJoin a host as worker:
splaictl worker join --url http://<gateway>:8080 --service systemdVerify:
splaictl verify --url http://<gateway>:8080Generate a token for controlled environments:
splaictl worker token createTrigger model download on workers from one API call:
curl -s -X POST http://localhost:8080/v1/admin/models/prefetch \
-H 'Content-Type: application/json' \
-d '{
"model":"meta-llama/Llama-3-8B-Instruct",
"source":"huggingface",
"workers":["worker-a","worker-b"],
"only_missing":true
}'Enable token auth by setting SPLAI_API_TOKENS.
When enabled, send bearer tokens in Authorization: Bearer <token> (or X-SPLAI-Token).
Example:
SPLAI_API_TOKENS='operator-token:operator|metrics,tenant-a-token:tenant:tenant-a'- Persistent state: Postgres (
SPLAI_STORE=postgres,SPLAI_POSTGRES_DSN=...) - Distributed queue: Redis (
SPLAI_QUEUE=redis,SPLAI_REDIS_ADDR=...) - Tracing/metrics: OpenTelemetry + Prometheus endpoints
- Helm chart:
charts/splai/ - CLI:
cmd/splaictl/ - Worker runtime:
worker/ - OpenAPI:
openapi/splai-admin-task.yaml - Proto:
proto/splai/v1/ - Architecture:
docs/architecture/kubernetes-crd-native.md - Persistent mode:
docs/reference/persistent-control-plane.md - Complete reference:
docs/reference/complete-operations-reference.md - Build/release guide:
docs/reference/build-release-registry-helm.md