Skip to content

agilord/llamacpp_rpc_client

Repository files navigation

HTTP client bindings to call the llama.cpp RPC server.

Usage

import 'package:llamacpp_rpc_client/llamacpp_rpc_client.dart';

void main() async {
  final client = LlamacppRpcClient('http://localhost:8080');

  // Text completion
  final completion = await client.completion(
    'The capital of France is',
    options: CompletionOptions(
      maxTokens: 50,
      temperature: 0.7,
    ),
  );
  print(completion.content);

  // Streaming completion
  await for (final chunk in client.streamCompletion('Tell me a story')) {
    print(chunk.content);
  }

  // Text embedding
  final embedding = await client.embedding('Hello world');
  print(embedding.embedding.length);

  client.close();
}

CLI Usage

The package includes a command-line interface for easy interaction with llama.cpp servers:

Completion Command

Generate text completions:

dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "The capital of France is" \
  --temperature 0.7 \
  --max-tokens 50

# Stream completion in real-time
dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "Tell me a story" \
  --stream

# Deterministic generation with seed
dart run bin/llamacpp_rpc_client.dart completion \
  --url http://localhost:8080 \
  --prompt "Hello world" \
  --seed 42

Options:

  • --url, -u: Base URL of the llama.cpp RPC server (required)
  • --prompt, -p: Input prompt for completion (required)
  • --temperature, -t: Temperature for randomness (0.0-2.0)
  • --max-tokens, -m: Maximum tokens to generate
  • --top-p: Nucleus sampling parameter (0.0-1.0)
  • --top-k: Top-k sampling parameter
  • --seed: Random seed for deterministic generation
  • --stream, -s: Stream completion in real-time

Embedding Command

Generate text embeddings:

dart run bin/llamacpp_rpc_client.dart embedding \
  --url http://localhost:8080 \
  --input "machine learning"

# Output raw embedding values
dart run bin/llamacpp_rpc_client.dart embedding \
  --url http://localhost:8080 \
  --input "artificial intelligence" \
  --raw

Options:

  • --url, -u: Base URL of the llama.cpp RPC server (required)
  • --input, -i: Input text for embedding generation (required)
  • --raw, -r: Output raw embedding vector values

About

HTTP client bindings to call the llama.cpp RPC server in Dart

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages