Hana is a fully customizable desktop companion that lives on your screen. She isn't just a 3D model, she can hear you, understand you, and speak back to you using advanced local AI.
- LLM Integration: Powered by Ollama (Llama3, Mistral, etc.) for local, private, and intelligent conversations.
- Push-to-Talk (PTT): Global, system-wide PTT keybind (Mouse or Keyboard) support.
- Whisper STT: State-of-the-art speech recognition using OpenAI's Whisper model (runs locally via Python).
- GPT-SoVITS Support: Full integration with the GPT-SoVITS inference engine for high-quality, realistic voice synthesis.
- Lip-syncs perfectly with the 3D model.
- Requires manual setup of the GPT-SoVITS engine.
- Transparent Overlay: Renders directly on top of your windows.
- Click-Through: Work seamlessly while she watches you. Toggle interaction with
F8. - Smart Tracking: Eyes and Head track your mouse cursor naturally.
- Node.js (v18+ recommended)
- Python 3.10+ (Required for AI services)
- Ollama: Download Here. Ensure you have pulled a model (e.g.,
ollama pull llama3). - CUDA capable GPU (Strongly recommended for Whisper & GPT-SoVITS).
git clone https://github.com/Matthew-IE/hana-project.git
cd hana-project
npm run install:allHana uses a local Python backend for Speech-to-Text (Whisper).
cd python
python -m venv venv
# Windows
.\venv\Scripts\activate
# Install requirements
pip install -r requirements.txtNote: You may need to install PyTorch manually with CUDA support if the default install doesn't pick it up.
Hana does not bundle the TTS engine or voice models. You must set it up manually:
- Download the GPT-SoVITS package (Beta/v2) from the official RVC-Boss/GPT-SoVITS repository or their releases. Tested on Windows, download the integrated package, follow instructions on their repo to install the pretrained models, run once, then continue here.
- Extract the contents of the GPT-SoVITS folder into:
hana-project/python/gpt-sovits/. - Ensure the structure looks like this:
Hana-Project/ ├── python/ │ ├── gpt-sovits/ │ │ ├── runtime/ (Python environment if included, or use venv) │ │ ├── GPT_SoVITS/ (Source code) │ │ ├── api_v2.py │ │ └── ... │ ├── services/ │ ├── main.py │ └── ... - Important: You must provide your own Reference Audio and Weights (
.pth/.ckpt) for the voice you want to use. Place them in a known location and configure them in the Hana Controller.
If you have issues with Global Hooks (PTT) not working:
cd hana-companion
npm rebuild --build-from-sourceStart the entire system with one command:
npm run devThis launches:
- Hana Core (Electron + Overlay).
- Hana Controller (Web Interface).
- Python AI Backend (Whisper + Audio Capture).
- GPT-SoVITS (If enabled in config, make sure GPT-SoVITS is installed).
- F8: Toggle Click-Through (Click "through" the model vs Dragging the model).
- PTT Key: Configurable in Settings (Default:
VorMouse4). Hold to speak, release to send.- Indicator: A red "Listening..." pill will appear in the top right of the screen when active.
Open the Controller (usually http://localhost:5173), or open the dedicated GUI in her System Tray to:
- Adjust Voice Settings (Select different GPT-SoVITS weights).
- Tweak AI Personality (System Prompt).
- Debug Animations.
- "Startup timed out": The TTS engine might take a minute to load. Check that
python/gpt-sovits/api_v2.pyexists. - Ollama Error: Ensure Ollama is running (
ollama serve) and the model specified inconfig.jsonis pulled.
hana-project/
├── hana-companion/ # Electron Desktop Application
│ ├── electron/ # Main process & Window management
│ ├── src/ # Renderer process (Three.js code)
│ └── public/ # Assets (Models, Icons)
│
├── hana-controller/ # Web Dashboard
│ ├── src/ # React UI Components
│ └── resources/ # Controller-specific assets
│
├── python/ # AI Backend
│ ├── gpt-sovits/ # (User Installed) GPT-SoVITS Engine
│ ├── services/ # Whisper & Audio Capture
│ ├── venv/ # Python Virtual Environment
│ └── main.py # Python Entry Point
│
└── package.json # Root scripts for monolithic management
- Core Companion: VRM rendering on transparent window.
- Smart Tracking: Eye and head tracking with mouse interaction.
- Controller UI: Web-based remote control for settings and debugging.
- Physics & Animations: Bone-based rotation and idle animation system.
- AI Integration: Local LLM connection via Ollama.
- Voice Communication: Speech-to-Text via Whisper (Push-to-Talk).
- Memory System: Context-aware interactions based on past conversations.
- Emotional Engine: Automatic emotion recognition and reaction.
Contributions are welcome! Please feel free to submit a Pull Request.

