Install Ollama
Run Llama 3.2:1b
ollama run llama3.2:1bcd litellm
docker compose up -dTest the API once:
curl http://localhost:4000/v1/models \
-H "Authorization: Bearer freellm"
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer freellm" \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/llama3.2:1b",
"messages": [{"role": "user", "content": "Explain vector databases in two sentences."}]
}'curl https://freellm.quick-labs.io/v1/models \
-H "Authorization: Bearer freellm"
curl https://freellm.quick-labs.io/v1/chat/completions
-H "Authorization: Bearer freellm"
-H "Content-Type: application/json"
-d '{
"model": "ollama/llama3.2:1b",
"messages": [{"role": "user", "content": "Explain vector databases in two sentences."}]
}'
Setup Cloudflare Reverse Proxy
export ANSIBLE_HOST_KEY_CHECKING=False
- Install the required community collections:
ansible-galaxy collection install -r ansible/collections/requirements.yml
- Update
ansible/inventory/hosts.iniwith your host name or IP and SSH user. - Harden the host firewall and Docker networking:
ansible-playbook \ -i ansible/inventory/hosts.ini \ ansible/playbooks/harden_firewall.yml \ --private-key ~/workspace/<private key>.pem
- Install Docker CE and supporting packages:
ansible-playbook \ -i ansible/inventory/hosts.ini \ ansible/playbooks/install_docker.yml \ --private-key ~/workspace/<private key>.pem
Role defaults keep SSH (TCP/22) open and lock down the proxy port (TCP/8001) to the CIDR list you supply via allowed_8001_cidrs. Override these variables in your inventory or on the command line to match your environment.
curl http://:8001/v1/chat/completions
-H "Authorization: Bearer SECRETKEY123"
-H "Content-Type: application/json"
-d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [{"role": "user", "content": "Explain vector databases in two sentences."}]
}'
Use the official image to start a GPU-enabled vLLM instance:
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model Qwen/Qwen3-0.6Bsudo iptables -I INPUT -s 18.236.254.81 -p tcp --dport 8002 -j ACCEPT sudo iptables -I INPUT -s 18.236.254.81 -p tcp --dport 8003 -j ACCEPT