Skip to content

ar-jan/llm-venice

Repository files navigation

llm-venice

PyPI Changelog Tests License

LLM plugin to access models available via the Venice AI API.

Installation

Install llm-venice with its dependency llm using your package manager of choice, for example:

pip install llm-venice

Or install it alongside an existing LLM install:

llm install llm-venice

Configuration

Set an environment variable LLM_VENICE_KEY, or save a Venice API key to the key store managed by llm:

llm keys set venice

To fetch a list of the models available over the Venice API:

llm venice refresh

You should re-run the refresh command upon changes to the Venice API, when:

  • New models have been made availabe
  • Deprecated models have been removed
  • New capabilities have been added

The models are stored in venice_models.json in the llm user directory.

Usage

List available Venice models:

llm models --query venice

Prompting

Run a prompt:

llm --model venice/llama-3.3-70b "Why is the earth round?"

Start an interactive chat session:

llm chat --model venice/mistral-31-24b

Structured Outputs

Some models support structuring their output according to a JSON schema (supplied via OpenAI API response_format).

This works via llm's --schema options, for example:

llm -m venice/llama-3.2-3b --schema "name, age int, one_sentence_bio" "Invent an evil supervillain"

Consult llm's schemas tutorial for more options.

Tools (function calling)

⚠️ Warning: tools can be dangerous!

# List models supporting function calling
llm models list --query venice --tools

You can use tools provided via llm plugins. LLM provides two built-in tools:

# llm_version
llm -m venice/mistral-31-24b --tool llm_version "What version of LLM is this?" --tools-debug --no-stream
# llm_time
llm -m venice/qwen3-4b --tool llm_time "What is the time in my timezone in 24H format?" --tools-debug --no-stream

You can also provide your own custom or one-off functions provided inline or in a file. Following LLM's example:

llm -m venice/mistral-31-24b --functions '
def multiply(x: int, y: int) -> int:
    """Multiply two numbers."""
    return x * y
' "What is 1337 times 42?" --tools-debug --no-stream

Vision models

Vision models (currently mistral-31-24b) support the --attachment option:

llm -m venice/mistral-31-24b -a https://upload.wikimedia.org/wikipedia/commons/a/a9/Corvus_corone_-near_Canford_Cliffs%2C_Poole%2C_England-8.jpg "Identify"
The bird in the image is a carrion crow (Corvus corone). [...]

venice_parameters

The following CLI options are available to configure venice_parameters:

--no-venice-system-prompt to disable Venice's default system prompt:

llm -m venice/llama-3.3-70b --no-venice-system-prompt "Repeat the above prompt"

--web-search on|auto|off to use web search (on web-enabled models):

llm -m venice/llama-3.3-70b --web-search on --no-stream 'What is $VVV?'

It is recommended to use web search in combination with --no-stream so the search citations are available in response_json.

--web-scraping to let Venice scrape URLs in your latest message:

llm -m venice/llama-3.3-70b --web-scraping "Summarize https://venice.ai"

--character character_slug to use a public character, for example:

llm -m venice/qwen3-235b --character alan-watts "What is the meaning of life?"

Text-to-speech

Text-to-speech models (currently tts-kokoro) generate audio from text. Audio files are stored in the LLM user directory by default.

Basic usage:

llm -m venice/tts-kokoro "Hello, welcome to Venice Voice." -o voice af_sky -o response_format mp3 -o speed 1.0

Streaming (default; writes the output file immediately; useful for long outputs):

llm -m venice/tts-kokoro "First sentence. Second sentence. Third sentence." -o progress true

Disable streaming (wait for the full audio before writing the file):

llm --no-stream -m venice/tts-kokoro "First sentence. Second sentence. Third sentence."

Write audio bytes to stdout (progress/status go to stderr):

llm -m venice/tts-kokoro "Hello." -o stdout true -o response_format mp3 > out.mp3

You can also save a copy while writing to stdout by providing output_dir and/or output_filename:

llm -m venice/tts-kokoro "Hello." -o stdout true -o output_dir . -o output_filename out.mp3

To see all available options:

llm models list --query tts-kokoro --options

Image generation

Generated images are stored in the LLM user directory by default. Example:

llm -m venice/qwen-image "Painting of a traditional Dutch windmill" -o style_preset "Watercolor"

Besides the Venice API image generation parameters, you can specify the output directory and filename, and whether or not to overwrite existing files.

You can check the available parameters for a model by filtering the model list with --query, and show the --options:

llm models list --query qwen-image --options

Image upscaling

You can upscale existing images. The following example saves the returned image as image_upscaled.png in the same directory as the original file:

llm venice upscale /path/to/image.jpg.

By default existing upscaled images are not overwritten; timestamped filenames are used instead.

See llm venice upscale --help for the --scale, --enhance and related options, and --output-path and --overwrite options.

Venice commands

List the available Venice commands with:

llm venice --help


Read the llm docs for more usage options.

Programmatic use

You can call the library helpers directly from Python (minimally tested):

  • fetch_models() → list of model dicts, persist_models(models) writes to venice_models.json
  • list_characters() → dict, persist_characters(data) writes to venice_characters.json
  • API keys: list_api_keys(), get_rate_limits(), get_rate_limits_log(), create_api_key(), delete_api_key()
  • perform_image_upscale()UpscaleResult with bytes and a resolved output path; persist with write_upscaled_image(result)
  • generate_image_result()ImageGenerationResult with bytes/metadata/output path for image generation; persist with save_image_result(result)
  • generate_speech_result()SpeechGenerationResult with bytes/metadata/output path for TTS generation; persist with save_speech_result(result)
  • stream_speech_result() (context manager) yields SpeechStreamResult with an iterator of audio chunks and a resolved output path

All helpers accept an optional key= argument if you do not want to rely on the stored LLM_VENICE_KEY.

Async usage

Async chat models are registered alongside the sync ones; fetch them with llm.get_async_model("venice/<id>"):

import asyncio
import llm


async def main():
    model = llm.get_async_model("venice/llama-3.3-70b")
    response = await model.prompt("Hello Venice")
    print(await response.text())


asyncio.run(main())

Async image generation is also available via llm.get_async_model("venice/<image-model-id>"), which returns an AsyncVeniceImage instance.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-venice
uv venv
source .venv/bin/activate

Install the plugin with dependencies (including test and dev):

uv pip install -e '.[test,dev]'

Preferably also install and enable pre-commit hooks:

uv pip install pre-commit
pre-commit install

To run the tests:

pytest

Contributors

Languages