LLM plugin to access models available via the Venice AI API.
Install llm-venice with its dependency llm using your package manager of choice, for example:
pip install llm-venice
Or install it alongside an existing LLM install:
llm install llm-venice
Set an environment variable LLM_VENICE_KEY, or save a Venice API key to the key store managed by llm:
llm keys set venice
To fetch a list of the models available over the Venice API:
llm venice refresh
You should re-run the refresh command upon changes to the Venice API, when:
- New models have been made availabe
- Deprecated models have been removed
- New capabilities have been added
The models are stored in venice_models.json in the llm user directory.
List available Venice models:
llm models --query venice
Run a prompt:
llm --model venice/llama-3.3-70b "Why is the earth round?"
Start an interactive chat session:
llm chat --model venice/mistral-31-24b
Some models support structuring their output according to a JSON schema (supplied via OpenAI API response_format).
This works via llm's --schema options, for example:
llm -m venice/llama-3.2-3b --schema "name, age int, one_sentence_bio" "Invent an evil supervillain"
Consult llm's schemas tutorial for more options.
# List models supporting function calling
llm models list --query venice --toolsYou can use tools provided via llm plugins. LLM provides two built-in tools:
# llm_version
llm -m venice/mistral-31-24b --tool llm_version "What version of LLM is this?" --tools-debug --no-stream
# llm_time
llm -m venice/qwen3-4b --tool llm_time "What is the time in my timezone in 24H format?" --tools-debug --no-streamYou can also provide your own custom or one-off functions provided inline or in a file. Following LLM's example:
llm -m venice/mistral-31-24b --functions '
def multiply(x: int, y: int) -> int:
"""Multiply two numbers."""
return x * y
' "What is 1337 times 42?" --tools-debug --no-streamVision models (currently mistral-31-24b) support the --attachment option:
llm -m venice/mistral-31-24b -a https://upload.wikimedia.org/wikipedia/commons/a/a9/Corvus_corone_-near_Canford_Cliffs%2C_Poole%2C_England-8.jpg "Identify"
The bird in the image is a carrion crow (Corvus corone). [...]
The following CLI options are available to configure venice_parameters:
--no-venice-system-prompt to disable Venice's default system prompt:
llm -m venice/llama-3.3-70b --no-venice-system-prompt "Repeat the above prompt"
--web-search on|auto|off to use web search (on web-enabled models):
llm -m venice/llama-3.3-70b --web-search on --no-stream 'What is $VVV?'
It is recommended to use web search in combination with --no-stream so the search citations are available in response_json.
--web-scraping to let Venice scrape URLs in your latest message:
llm -m venice/llama-3.3-70b --web-scraping "Summarize https://venice.ai"
--character character_slug to use a public character, for example:
llm -m venice/qwen3-235b --character alan-watts "What is the meaning of life?"
Text-to-speech models (currently tts-kokoro) generate audio from text. Audio files are stored in the LLM user directory by default.
Basic usage:
llm -m venice/tts-kokoro "Hello, welcome to Venice Voice." -o voice af_sky -o response_format mp3 -o speed 1.0
Streaming (default; writes the output file immediately; useful for long outputs):
llm -m venice/tts-kokoro "First sentence. Second sentence. Third sentence." -o progress true
Disable streaming (wait for the full audio before writing the file):
llm --no-stream -m venice/tts-kokoro "First sentence. Second sentence. Third sentence."
Write audio bytes to stdout (progress/status go to stderr):
llm -m venice/tts-kokoro "Hello." -o stdout true -o response_format mp3 > out.mp3
You can also save a copy while writing to stdout by providing output_dir and/or output_filename:
llm -m venice/tts-kokoro "Hello." -o stdout true -o output_dir . -o output_filename out.mp3
To see all available options:
llm models list --query tts-kokoro --options
Generated images are stored in the LLM user directory by default. Example:
llm -m venice/qwen-image "Painting of a traditional Dutch windmill" -o style_preset "Watercolor"
Besides the Venice API image generation parameters, you can specify the output directory and filename, and whether or not to overwrite existing files.
You can check the available parameters for a model by filtering the model list with --query, and show the --options:
llm models list --query qwen-image --options
You can upscale existing images.
The following example saves the returned image as image_upscaled.png in the same directory as the original file:
llm venice upscale /path/to/image.jpg.
By default existing upscaled images are not overwritten; timestamped filenames are used instead.
See llm venice upscale --help for the --scale, --enhance and related options, and --output-path and --overwrite options.
List the available Venice commands with:
llm venice --help
Read the llm docs for more usage options.
You can call the library helpers directly from Python (minimally tested):
fetch_models()→ list of model dicts,persist_models(models)writes tovenice_models.jsonlist_characters()→ dict,persist_characters(data)writes tovenice_characters.json- API keys:
list_api_keys(),get_rate_limits(),get_rate_limits_log(),create_api_key(),delete_api_key() perform_image_upscale()→UpscaleResultwith bytes and a resolved output path; persist withwrite_upscaled_image(result)generate_image_result()→ImageGenerationResultwith bytes/metadata/output path for image generation; persist withsave_image_result(result)generate_speech_result()→SpeechGenerationResultwith bytes/metadata/output path for TTS generation; persist withsave_speech_result(result)stream_speech_result()(context manager) yieldsSpeechStreamResultwith an iterator of audio chunks and a resolved output path
All helpers accept an optional key= argument if you do not want to rely on the stored LLM_VENICE_KEY.
Async chat models are registered alongside the sync ones; fetch them with llm.get_async_model("venice/<id>"):
import asyncio
import llm
async def main():
model = llm.get_async_model("venice/llama-3.3-70b")
response = await model.prompt("Hello Venice")
print(await response.text())
asyncio.run(main())Async image generation is also available via llm.get_async_model("venice/<image-model-id>"), which returns an AsyncVeniceImage instance.
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-venice
uv venv
source .venv/bin/activateInstall the plugin with dependencies (including test and dev):
uv pip install -e '.[test,dev]'Preferably also install and enable pre-commit hooks:
uv pip install pre-commit
pre-commit installTo run the tests:
pytest