Embedding Queue

A simple distributed system for offloading text embedding tasks to a local GPU.

What is this?

If you have a machine with a GPU but want to run your main application somewhere else (like AWS), this project lets you:

Submit texts from anywhere via a simple REST API
Process embeddings on your local GPU using Ollama
Retrieve results when they're ready

The queue server runs in the cloud, your GPU worker runs locally, and they communicate over HTTP.

Architecture

Your App (cloud)          Queue Server (cloud)           Your PC (local GPU)
      │                          │                              │
      ├── Submit text ──────────►│                              │
      │                          │◄──── Worker polls for tasks ─┤
      │                          │                              │
      │                          │──── Send text to process ───►│
      │                          │                              ├── Ollama generates
      │                          │◄──── Return embedding ───────┤    embedding
      │                          │                              │
      ◄── Get result ────────────┤                              │

Quick Start

1. Start the services

docker-compose up --build

2. Download an embedding model (first time only)

docker exec -it embeddingqueue-ollama-1 ollama pull nomic-embed-text

3. Get embeddings (OpenAI-compatible)

curl:

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Authorization: Bearer your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{"input": "The quick brown fox jumps over the lazy dog", "model": "nomic-embed-text"}'

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    api_key="your-secret-token",
    base_url="http://localhost:8000/v1"
)

response = client.embeddings.create(
    input="The quick brown fox jumps over the lazy dog",
    model="nomic-embed-text"
)

embedding = response.data[0].embedding

Response (OpenAI format):

{
  "object": "list",
  "data": [{"object": "embedding", "embedding": [0.123, -0.456, ...], "index": 0}],
  "model": "nomic-embed-text"
}

The server waits for the result (default 10 seconds, max 30 seconds). If processing takes longer, it returns a task ID:

{"id": "550e8400-e29b-41d4-a716-446655440000"}

Then poll for the result:

curl http://localhost:8000/tasks/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer your-secret-token"

API Endpoints

Method	Endpoint	Description
`POST`	`/v1/embeddings`	OpenAI-compatible - sync with long polling
`GET`	`/tasks/{id}`	Get task status and result (for polling)
`GET`	`/tasks/{id}/result`	Get only the embedding
`GET`	`/health`	Health check

All endpoints require Authorization: Bearer your-secret-token header.

OpenAI Endpoint Details

POST /v1/embeddings

Field	Type	Default	Description
`input`	string	required	Text to embed
`model`	string	`nomic-embed-text`	Model name (optional)
`wait_seconds`	int	`10`	Wait time in seconds (0 = return ID immediately, max 30)

Configuration

Edit .env file to customize:

Variable	Default	Description
`AUTH_TOKEN`	`your-secret-token`	API authentication token
`EMBEDDING_MODEL`	`nomic-embed-text`	Ollama model to use
`POLL_INTERVAL`	`2`	Worker poll frequency (seconds)
`SERVER_PORT`	`8000`	Server port
`DB_PATH`	`data/embedding_queue.db`	SQLite database path

Requirements

Docker & Docker Compose
NVIDIA GPU with NVIDIA Container Toolkit

Project Structure

EmbeddingQueue/
├── server/              # FastAPI queue server
│   ├── main.py          # API endpoints
│   ├── database.py      # SQLite operations
│   ├── models.py        # Request/response models
│   └── config.py        # Configuration
├── worker/              # GPU worker client
│   ├── worker.py        # Polling loop
│   ├── embedder.py      # Ollama client
│   └── config.py        # Configuration
├── docker-compose.yml
├── lightsail-deploy.json  # AWS Lightsail deployment
├── .env                 # Environment variables
└── data/                # SQLite database (persistent)

Task States

pending - Waiting to be processed
processing - Worker is generating embedding
completed - Embedding ready
failed - Error occurred (check error field)

AWS Lightsail Deployment

Deploy the server container to AWS Lightsail for cloud hosting.

1. Install AWS CLI

pip install awscli
aws configure

2. Create a Lightsail container service

aws lightsail create-container-service \
  --service-name embedding-queue \
  --power small \
  --scale 1

3. Build and push the server image

# Build the server image
docker build -t embedding-queue-server ./server

# Push to Lightsail
aws lightsail push-container-image \
  --service-name embedding-queue \
  --label server \
  --image embedding-queue-server

4. Deploy the container

Update lightsail-deploy.json with the image name from step 3, then:

aws lightsail create-container-service-deployment \
  --service-name embedding-queue \
  --cli-input-json file://lightsail-deploy.json

5. Get the public URL

aws lightsail get-container-services --service-name embedding-queue

The URL will be in the format: https://embedding-queue.xxxxx.us-east-1.cs.amazonlightsail.com

6. Run the worker locally

Point your local worker to the Lightsail server:

SERVER_URL=https://embedding-queue.xxxxx.us-east-1.cs.amazonlightsail.com \
AUTH_TOKEN=your-secret-token \
docker-compose up worker ollama

Lightsail Configuration

Edit lightsail-deploy.json to customize:

Field	Description
`serviceName`	Lightsail service name
`environment.AUTH_TOKEN`	API authentication token
`publicEndpoint.healthCheck`	Health check settings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embedding Queue

What is this?

Architecture

Quick Start

1. Start the services

2. Download an embedding model (first time only)

3. Get embeddings (OpenAI-compatible)

API Endpoints

OpenAI Endpoint Details

Configuration

Requirements

Project Structure

Task States

AWS Lightsail Deployment

1. Install AWS CLI

2. Create a Lightsail container service

3. Build and push the server image

4. Deploy the container

5. Get the public URL

6. Run the worker locally

Lightsail Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
server		server
worker		worker
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
lightsail-deploy.json		lightsail-deploy.json

Folders and files

Latest commit

History

Repository files navigation

Embedding Queue

What is this?

Architecture

Quick Start

1. Start the services

2. Download an embedding model (first time only)

3. Get embeddings (OpenAI-compatible)

API Endpoints

OpenAI Endpoint Details

Configuration

Requirements

Project Structure

Task States

AWS Lightsail Deployment

1. Install AWS CLI

2. Create a Lightsail container service

3. Build and push the server image

4. Deploy the container

5. Get the public URL

6. Run the worker locally

Lightsail Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages