A simple distributed system for offloading text embedding tasks to a local GPU.
If you have a machine with a GPU but want to run your main application somewhere else (like AWS), this project lets you:
- Submit texts from anywhere via a simple REST API
- Process embeddings on your local GPU using Ollama
- Retrieve results when they're ready
The queue server runs in the cloud, your GPU worker runs locally, and they communicate over HTTP.
Your App (cloud) Queue Server (cloud) Your PC (local GPU)
│ │ │
├── Submit text ──────────►│ │
│ │◄──── Worker polls for tasks ─┤
│ │ │
│ │──── Send text to process ───►│
│ │ ├── Ollama generates
│ │◄──── Return embedding ───────┤ embedding
│ │ │
◄── Get result ────────────┤ │
docker-compose up --builddocker exec -it embeddingqueue-ollama-1 ollama pull nomic-embed-textcurl:
curl -X POST http://localhost:8000/v1/embeddings \
-H "Authorization: Bearer your-secret-token" \
-H "Content-Type: application/json" \
-d '{"input": "The quick brown fox jumps over the lazy dog", "model": "nomic-embed-text"}'Python (OpenAI SDK):
from openai import OpenAI
client = OpenAI(
api_key="your-secret-token",
base_url="http://localhost:8000/v1"
)
response = client.embeddings.create(
input="The quick brown fox jumps over the lazy dog",
model="nomic-embed-text"
)
embedding = response.data[0].embeddingResponse (OpenAI format):
{
"object": "list",
"data": [{"object": "embedding", "embedding": [0.123, -0.456, ...], "index": 0}],
"model": "nomic-embed-text"
}The server waits for the result (default 10 seconds, max 30 seconds). If processing takes longer, it returns a task ID:
{"id": "550e8400-e29b-41d4-a716-446655440000"}Then poll for the result:
curl http://localhost:8000/tasks/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer your-secret-token"| Method | Endpoint | Description |
|---|---|---|
POST |
/v1/embeddings |
OpenAI-compatible - sync with long polling |
GET |
/tasks/{id} |
Get task status and result (for polling) |
GET |
/tasks/{id}/result |
Get only the embedding |
GET |
/health |
Health check |
All endpoints require Authorization: Bearer your-secret-token header.
POST /v1/embeddings
| Field | Type | Default | Description |
|---|---|---|---|
input |
string | required | Text to embed |
model |
string | nomic-embed-text |
Model name (optional) |
wait_seconds |
int | 10 |
Wait time in seconds (0 = return ID immediately, max 30) |
Edit .env file to customize:
| Variable | Default | Description |
|---|---|---|
AUTH_TOKEN |
your-secret-token |
API authentication token |
EMBEDDING_MODEL |
nomic-embed-text |
Ollama model to use |
POLL_INTERVAL |
2 |
Worker poll frequency (seconds) |
SERVER_PORT |
8000 |
Server port |
DB_PATH |
data/embedding_queue.db |
SQLite database path |
- Docker & Docker Compose
- NVIDIA GPU with NVIDIA Container Toolkit
EmbeddingQueue/
├── server/ # FastAPI queue server
│ ├── main.py # API endpoints
│ ├── database.py # SQLite operations
│ ├── models.py # Request/response models
│ └── config.py # Configuration
├── worker/ # GPU worker client
│ ├── worker.py # Polling loop
│ ├── embedder.py # Ollama client
│ └── config.py # Configuration
├── docker-compose.yml
├── lightsail-deploy.json # AWS Lightsail deployment
├── .env # Environment variables
└── data/ # SQLite database (persistent)
pending- Waiting to be processedprocessing- Worker is generating embeddingcompleted- Embedding readyfailed- Error occurred (checkerrorfield)
Deploy the server container to AWS Lightsail for cloud hosting.
pip install awscli
aws configureaws lightsail create-container-service \
--service-name embedding-queue \
--power small \
--scale 1# Build the server image
docker build -t embedding-queue-server ./server
# Push to Lightsail
aws lightsail push-container-image \
--service-name embedding-queue \
--label server \
--image embedding-queue-serverUpdate lightsail-deploy.json with the image name from step 3, then:
aws lightsail create-container-service-deployment \
--service-name embedding-queue \
--cli-input-json file://lightsail-deploy.jsonaws lightsail get-container-services --service-name embedding-queueThe URL will be in the format: https://embedding-queue.xxxxx.us-east-1.cs.amazonlightsail.com
Point your local worker to the Lightsail server:
SERVER_URL=https://embedding-queue.xxxxx.us-east-1.cs.amazonlightsail.com \
AUTH_TOKEN=your-secret-token \
docker-compose up worker ollamaEdit lightsail-deploy.json to customize:
| Field | Description |
|---|---|
serviceName |
Lightsail service name |
environment.AUTH_TOKEN |
API authentication token |
publicEndpoint.healthCheck |
Health check settings |