🎬 Click the image to watch the YouTube Shorts demo
PokeClaw is a voice-controlled AI desktop assistant built on a Raspberry Pi Zero W utilizing the PiSugar WhisPlay board. Simply press the physical button to speak, and get a streamed response on the 1.54" LCD accompanied by an animated character (defaults to "kirby" or "lobster") and TTS playback.
PokeClaw is detached and originally forked from pizero-openclaw. A huge thanks to the original author sebastianvkl for their MIT licensed contributions, which provided the solid hardware driver and communication foundation for this project!
Button press → Record audio → Transcribe (OpenAI/Gemini/GLM) → Stream LLM response (OpenClaw) → Real-time Display on LCD
↓
(Optional) Speak aloud (TTS)
- Press & hold the button to record your voice via ALSA
- Release — the WAV is sent to OpenAI, Gemini, or Zhipu GLM for ultra-fast transcription (~0.7s)
- The transcript is streamed to your OpenClaw gateway for a response
- Text streams onto the LCD in real time with pixel-accurate word wrapping, like a typewriter
- Optionally speaks the response via TTS as soon as the first sentence completes. Includes a Smart Preprocessing System: automatically converts numbers to spoken Chinese, masks unreadable Markdown tables with placeholders, converts bullet points to ordered ordinals ("First...", "Second..."), and strips formatting such as bold or italics. Meanwhile, the original raw text (with formatting and digits) is still displayed on the screen for a clear visual-vocal separation.
- The idle screen shows a clock, date, battery %, and WiFi status
- When active, the character animation loops fluidly between listening, thinking, and talking states. When talking, it automatically lip-syncs to the volume (RMS) of the TTS output!
The device includes a silence gate to skip empty recordings, and OpenClaw automatically maintains your conversation memory across exchanges via cloud session keys.
- Raspberry Pi Zero 2 W (or Pi Zero W)
- PiSugar WhisPlay board — 1.54" LCD (240x240), push-to-talk button, LED, speaker, microphone
- PiSugar battery (optional) — reads and shows charge level on screen
- Raspberry Pi OS (Bookworm or later)
- Python 3.11+
- API keys for speech-to-text and TTS (OpenAI, Google Gemini, Zhipu GLM, or ByteDance Doubao)
- An OpenClaw gateway running somewhere accessible on your network
Important
Since this project supports rendering Chinese characters on the screen, you must install the Chinese font library (fonts-wqy-microhei) to prevent text corruption.
sudo apt install python3-numpy python3-pil fonts-wqy-microhei
pip install requests python-dotenvEnsure the WhisPlay hardware driver is installed and loaded properly per the PiSugar WhisPlay setup guide.
Copy the example env file and fill in your keys:
cp .env.example .envEdit .env:
export OPENAI_API_KEY="sk-your-openai-api-key"
export AUDIO_PROVIDER="doubao" # "openai", "gemini", "glm", or "doubao"
export DISPLAY_CHARACTER="lobster" # defaults to "kirby". Options: "kirby" or "lobster"
export PI_USER="pi" # Change this if your Raspberry Pi username is different
export GLM_API_KEY="your-glm-api-key"
export DOUBAO_APPID="your-appid"
export DOUBAO_ACCESS_TOKEN="your-token"
export OPENCLAW_TOKEN="your-openclaw-gateway-token"python3 -m core.mainOr deploy as a systemd background service using the included sync.sh script.
Advanced settings can be configured via environment variables (in .env) and defaults hardcoded in core/config.py:
| Variable | Default | Description |
|---|---|---|
AUDIO_PROVIDER |
openai |
API provider for STT & TTS (openai, gemini, glm, or doubao) |
DOUBAO_APPID |
(required if doubao) | Doubao/Volcengine AppID |
DOUBAO_ACCESS_TOKEN |
(required if doubao) | Doubao Bearer Token |
DOUBAO_VOICE_TYPE |
bv001_streaming |
Doubao voice selection code |
DISPLAY_CHARACTER |
kirby |
The character sprite animation pack (kirby or lobster) |
OPENAI_API_KEY |
(required if openai) | OpenAI API key |
GEMINI_API_KEY |
(required if gemini) | Gemini API key |
GLM_API_KEY |
(required if glm) | Zhipu GLM API key |
OPENCLAW_TOKEN |
(required) | Auth token for the OpenClaw gateway |
OPENCLAW_BASE_URL |
https://... |
OpenClaw gateway URL |
ENABLE_TTS |
false |
Speak responses aloud |
LCD_BACKLIGHT |
70 |
Backlight brightness (0–100) |
SILENCE_RMS_THRESHOLD |
200 |
Audio RMS below this is skipped |
To make the assistant's voice sound more natural, the project includes a built-in preprocessing engine optimized for Chinese:
- Digit-to-Chinese: Automatically converts
129.80to Chinese reading, and handles years (e.g.,2025read as single digits). - Markdown Stripping: Automatically removes bold (
**), italic (*), inline code (`), headers (#), and links. - Structural Content Recognition:
- Table Masking: Detects Markdown tables and replaces them with a prompt ("I've summarized a table here for you to read on screen") to avoid reading gibberish.
- List Optimization: Converts unordered lists (
-) into ordered readings ("First...", "Second...").
- Visual-Vocal Separation: The LCD displays the original formatted Markdown text, while the TTS plays only the cleaned, natural speech.
(See .env.example for all advanced configuration options)
- Support more underlying LLM/TTS/STT API models
- Develop more character animations and support richer emotional expressions (happy, angry, sad, etc.)
MIT License
This project was originally forked from pizero-openclaw. Thank you to the open-source community!