Llama.cpp Server Test

This repository contains scripts for running and interacting with a llama.cpp server.

Components

start-local-server.sh - Bash script to start the llama.cpp server locally
start-docker-server.sh - Bash script to start the server using Docker
chat_client.py - Python client for chatting with the llama.cpp server
examples.py - Example usage of the Python chat client

Setup

1. Install Python Dependencies

Using pip:

pip install -r requirements.txt

Or using the project:

pip install -e .

2. Start the Llama.cpp Server

Use the provided script to start your server:

./start-local-server.sh

The script will:

Allow you to configure server settings via environment variables or command-line arguments
Show available models in your models directory
Let you select which model to use
Choose between foreground or background mode

Using the Python Chat Client

Interactive Chat

Start an interactive chat session:

python chat_client.py

With custom server settings:

python chat_client.py --host localhost --port 8000

Single Message

Send a single message:

python chat_client.py --message "Hello, how are you?"

Streaming Responses

Enable streaming for real-time responses:

python chat_client.py --stream

Available Commands in Interactive Mode

help - Show available commands
system <prompt> - Set a system prompt
clear - Clear the system prompt
stream - Toggle streaming mode on/off
quit or exit - End the conversation

Python API Examples

Basic Usage

from chat_client import LlamaCppClient

# Create client
client = LlamaCppClient(host="localhost", port=8000)

# Check if server is running
if client.check_health():
    # Send a message
    response = client.chat_completion("Hello!")
    print(response)

With System Prompt

response = client.chat_completion(
    message="Write a Python function to calculate fibonacci numbers",
    system_prompt="You are a helpful coding assistant",
    temperature=0.3,
    max_tokens=500
)

Streaming Responses

for chunk in client.stream_chat_completion("Tell me a story"):
    print(chunk, end="", flush=True)

Configuration

Environment Variables

You can configure the server using environment variables:

LLAMA_HOST - Server host (default: 0.0.0.0)
LLAMA_PORT - Server port (default: 8000)
LLAMA_MODELS_PATH - Path to model files (default: /models)
LLAMA_CONTEXT_SIZE - Context size (default: 512)
LLAMA_GPU_LAYERS - GPU layers (default: 99)
LLAMA_LOG_FILE - Log file name (default: llama-server.log)

Command-line Arguments

The server script supports various command-line arguments that override environment variables:

--host - Server host
-p, --port - Server port
-m, --models-path - Path to model files
-c, --context-size - Context size
-g, --gpu-layers - GPU layers
-l, --log-file - Log file name
-h, --help - Show help message

Examples

Run the example script to see different usage patterns:

python examples.py

This will demonstrate:

Simple chat interactions
Using system prompts
Streaming responses
Multi-turn conversations

API Endpoints

The llama.cpp server provides OpenAI-compatible endpoints:

GET /health - Health check
GET /v1/models - List available models
POST /v1/chat/completions - Chat completions (streaming and non-streaming)

Troubleshooting

Server not responding: Make sure the llama.cpp server is running and accessible
Import errors: Install dependencies with pip install -r requirements.txt
Connection refused: Check if the host and port are correct
No models found: Ensure your models directory contains .gguf files

Requirements

Python 3.10+
requests library
llama.cpp server running locally or remotely

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
utils		utils
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
chat_client.py		chat_client.py
examples.py		examples.py
hello.py		hello.py
main.py		main.py
pyproject.toml		pyproject.toml
sample-response.json		sample-response.json
start-docker-server.sh		start-docker-server.sh
start-local-server.sh		start-local-server.sh
test-curl.sh		test-curl.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Llama.cpp Server Test

Components

Setup

1. Install Python Dependencies

2. Start the Llama.cpp Server

Using the Python Chat Client

Interactive Chat

Single Message

Streaming Responses

Available Commands in Interactive Mode

Python API Examples

Basic Usage

With System Prompt

Streaming Responses

Configuration

Environment Variables

Command-line Arguments

Examples

API Endpoints

Troubleshooting

Requirements

About

Uh oh!

Releases

Packages

Languages

aleenprd/llamacpp_client

Folders and files

Latest commit

History

Repository files navigation

Llama.cpp Server Test

Components

Setup

1. Install Python Dependencies

2. Start the Llama.cpp Server

Using the Python Chat Client

Interactive Chat

Single Message

Streaming Responses

Available Commands in Interactive Mode

Python API Examples

Basic Usage

With System Prompt

Streaming Responses

Configuration

Environment Variables

Command-line Arguments

Examples

API Endpoints

Troubleshooting

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages