Translation note (2025-10-30): This document is an English translation of
docs/de/ReadMe.mdat commit8d8c4b7d30a63adb857a251be6b1331529267e69.
Yul Yen's AI Orchestra is a locally running AI environment that combines multiple personas (Leah, Doris, Peter, Popcorn).
All personas are based on a local LLM (currently via Ollama or compatible backends) and come with their own characters and language styles.
The project supports:
- Terminal UI with colored console output & streaming
- Web UI built on Gradio (accessible within the local network)
- AI dialog (self-talk) between two personas (terminal + web)
- Text-to-speech (TTS) with automatic WAV generation in terminal mode
- API (FastAPI) for integration into external applications
- Wikipedia integration (online or offline via Kiwix proxy)
- Security filters (prompt-injection protection & PII detection)
- Logging & tests for stable usage
See also: Features.md
- Provide a private, locally running AI for German-language interaction
- Multiple characters with distinct styles:
- Leah: empathetic, friendly
- Doris: sarcastic, humorous, cheeky
- Peter: fact-oriented, analytical
- Popcorn: playful, child-friendly
- Extensible foundation for future features (e.g., LoRA fine-tuning, tool use, RAG, STT)
- KISS principle: simple, transparent architecture
- Configuration: All settings centrally stored in
config.yaml - Core:
- Swappable LLM core (
OllamaLLMCore,DummyLLMCorefor tests) includingYulYenStreamingProvider - Wikipedia support including a spaCy-based keyword extractor
- Swappable LLM core (
- Personas: System prompts & quirks in
src/config/personas.py - UI:
TerminalUIfor the CLIWebUI(Gradio) with persona selection & avatars- Optional ask-all broadcast mode (enable
ui.experimental.broadcast_mode) via the Ask-All option in the terminal start menu and the Ask-All card in the web UI
- API: FastAPI server (
/askendpoint for one-shot questions) - Logging:
- Chat transcripts and system logs in
logs/ - Wiki proxy writes separate log files
- Chat transcripts and system logs in
- Python 3.10+
- Ollama (or another compatible backend) with an installed model, for example:
ollama pull leo-hessianai-13b-chat:Q5
- For tests without Ollama you can set
core.backend: "dummy"– the echo backend requires no additional downloads and is suitable for CI or quick prototyping. - Optional for offline wiki usage:
- Kiwix + German ZIM archive
git clone https://github.com/YulYen/YulYens_AI.git
cd YulYens_AI
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtThe Wikipedia integration requires a spaCy model that matches your configured language. The keyword finder now looks up the correct package via the combination of language and wiki.spacy_model_variant, using the mapping in wiki.spacy_model_map inside config.yaml. This keeps the model choice entirely in configuration, without hard-coded defaults.
Example:
language: "en"
wiki:
spacy_model_variant: "medium"
spacy_model_map:
en:
medium: "en_core_web_md"
large: "en_core_web_lg"Additionally, you have to install the corresponding model manually:
# Medium model (balance between size and accuracy)
python -m spacy download en_core_web_md
# Large model (more accurate, but slower and uses more memory)
python -m spacy download en_core_web_lgAll central settings are controlled through config.yaml. Important toggles:
language: controls UI texts and persona prompts ("de"or"en").ui.type: selects the interface ("terminal","web", ornullfor API only).tts.enabled: enables/disables text-to-speech.tts.features.terminal_auto_create_wav: attempts to create one WAV file per reply in terminal mode (currently Windows-only due towinsounddependency intts.audio_player).
Example:
language: "de"
core:
# Choose backend: "ollama" (default) or "dummy" (echo backend for tests)
backend: "ollama"
# Default model for Ollama
model_name: "leo-hessianai-13b-chat.Q5"
# URL of the locally running Ollama server (protocol + host + port).
# This value must be set explicitly – there is no silent default.
ollama_url: "http://127.0.0.1:11434"
# Warm-up: whether to send a dummy call to the model at startup.
warm_up: false
ui:
type: "terminal" # Alternatives: "web" or null (API only)
web:
host: "0.0.0.0"
port: 7860
share: false # Optional Gradio share (requires username/password)
wiki:
mode: "offline" # "offline", "online" or false (disabled)
spacy_model_variant: "large" # Alternatives: "medium" or direct model name
proxy_port: 8042
snippet_limit: 1600 # Maximum length of a single snippet in characters
max_wiki_snippets: 2 # Cap for how many different snippets can be injected per questionThe key core.backend determines which LLM core is used:
ollama(default) integrates a running Ollama server. The Python packageollamaneeds to be installed (e.g., viapip install ollama), andcore.ollama_urlmust point to the Ollama instance.dummyuses theDummyLLMCore, which returns each input asECHO: .... This is ideal for unit tests, continuous integration, or demos without an available LLM. In this mode a placeholder forcore.ollama_urlis sufficient; neither a running Ollama server nor the Python package is required.
The security section selects the guard for input and output checks:
security.guard: "BasicGuard"(default) loads the built-in base protection. The togglesprompt_injection_protection,pii_protection, andoutput_blocklistcontrol which checks are active.security.guard: "DisabledGuard"disables the checks via a stub. The aliases"disabled","none", and"off"are accepted as well.security.enabled: falsedisables the guard logic entirely, regardless of the selected name.
- In offline mode (
wiki.mode: "offline"),kiwix-servecan be started automatically whenwiki.offline.autostart: trueis set. wiki.max_wiki_snippetscontrols how many distinct Wikipedia excerpts may enter the prompt (default: 2), so multiple hits are useful without overloading the context.
python src/launch.py -e classicThe --ensemble (short -e) parameter selects which ensemble definition to start. classic is the
default choice for the regular experience. You can try another ensemble, such as the
spaceship_crew example, by running:
python src/launch.py -e examples/spaceship_crewFor a complete walkthrough on building your own ensemble, see Adding a custom ensemble.
On Windows, replace / with \ (examples\spaceship_crew).
You can optionally pass an alternative configuration file via --config (short -c) alongside the
ensemble parameter, for example:
python src/launch.py -e classic --config path/to/config.yaml-
Terminal UI
- Use in the terminal when
ui.type: "terminal" - Input: simply type your questions
- Commands:
exit(quit),clear(start a new conversation)
- Use in the terminal when
-
Web UI
- With
ui.type: "web", a web interface starts automatically - Open in the browser:
http://<host>:<port>according to theui.websettings (default:http://127.0.0.1:7860) - Optional: enable Gradio share via
ui.web.share: true; credentials come fromui.web.share_auth - Pick a persona and start chatting
- With
-
API only (no UI)
- Set
ui.type: null– FastAPI keeps running and serves/ask
- Set
-
API (FastAPI)
- Automatically active when
api.enabled: true - Example request using
curl:curl -X POST http://127.0.0.1:8013/ask \ -H "Content-Type: application/json" \ -d '{"question":"Who developed the theory of relativity?", "persona":"LEAH"}'
- Automatically active when
Question (Leah):
Who is Angela Merkel?
Answer (streamed):
Angela Merkel is a German politician (CDU) who served as the Chancellor of the Federal Republic of Germany from 2005 to 2021. …
Run with pytest:
pytest tests/🚧 Work in progress – stable to use, but under active development (including initial LoRA fine-tuning experiments). Private project, not intended for production use.