For years, I’ve been advocating for speakerless voice assistants — mainly because 99% of us already have a media player in our homes, right?
This project contains three companion devices designed to work with Home Assistant and ESPHome: VoiceSensor, VoiceEar, and VoiceScreen.
The original prototype of this project (and the name of this repo).
It was a single device containing:
- a microphone
- a presence sensor
- a light sensor
The idea: these sensors naturally belong together and are usually mounted in the same place — high on a wall or ceiling.
A simpler version of VoiceSensor. It contains only a microphone, acting purely as an “ear” for your voice assistant.
Audio responses can be sent to any Home Assistant media player, so the device itself does not need a speaker.
All devices — VoiceSensor, VoiceEar, Respeaker Lite, and the entire Xiaozhi-ESPHome lineup — send the following events:
-
Audio Path Used to route the assistant’s audio output to any chosen media player via automation.
-
Request Text The recognized speech (what the user asked). Can be forwarded to VoiceScreen or a dashboard.
-
Response Text The assistant’s reply in text form. Also sendable to VoiceScreen or dashboards.
-
Phase ID The current pipeline step. Useful for showing different visuals on VoiceScreen depending on the assistant’s state.
VoiceScreen is an ESP32-S3 display that acts as a visual companion to VoiceSensor, VoiceEar, and all devices from the Xiaozhi-ESPHome repository.
It waits for incoming events and updates the display accordingly. Touching the screen sends button events back to the selected voice assistant device (e.g., Start / Stop listening).
Schematics
- Works only with ESP32-S3 Zero (not SuperMini).
- Zero includes 2MB PSRAM, required for on-device MicroWakeWord.
- Supports up to 2 wake words, processed locally.
- For outputting responses to an external media player, look in automations
- Just the "Ear" (no sensors) version of "VoiceSensor"
- Use the original placement of the file (esphome/common/)
- Simple yaml that reads the phase id's from HA and display images accordingly
- Can also control the "virtual touch" of the voicesensor by touching the display.
- ESP32-S3 Zero
- INMP441 microphone
- LD2410C presence sensor
- BH1750 ambient light sensor
- 16mm push button (GPIO 10)
- ESP32-S3 Zero
- INMP441 microphone
-
Designed to fit inside a custom enclosure with accessible test pins.
-
Minimal soldering required, unless:
- You need custom power splitters.
- Pin headers arrive loose or unsoldered.
-
🗣️ Local Voice Assistant On-device processing with LED phase feedback.
-
👤 Presence Detection LD2410C millimeter-wave radar sensor.
-
💡 Light Level Monitoring BH1750 digital luminance sensor.
-
🔘 Push Button Input For triggering actions manually.
-
🔧 Maker-Friendly Fully testable on a breadboard before final assembly.








