Axel AI Assistant

Real-time AI interview assistant for macOS. Listens to the interview audio (microphone + interviewer voice from Zoom/Teams/Meet), transcribes speech in real time, automatically detects questions, and generates answers in an invisible overlay on top of your screen.

Features

Stealth overlay — invisible to screen sharing (content protection), always on top, transparent
Dual audio capture — microphone (candidate) + system audio (interviewer via BlackHole)
Real-time transcription — local Whisper (whisper.cpp) or cloud Deepgram Nova-3
Auto question detection — 3s pause after interviewer speech triggers answer generation
Streaming AI answers — GPT-4o / Claude with markdown rendering and code highlighting
Screenshot analysis — capture screen and send to Vision model for code/task analysis
Separate mic buffer — candidate's mic doesn't auto-trigger; manual send via F5
Chat-style answer view — scrollable history of all Q&A pairs with auto-scroll
Runtime settings — switch LLM provider/model, transcription provider, upload resume/job description

Architecture

┌──────────────────────────────────────────────────┐
│            Electron Overlay (React + TS)          │
│  Transparent, frameless, always-on-top           │
│  setContentProtection(true) — hidden from share  │
│  Dynamic size: 60% width centered, 80% opacity   │
└──────────┬─────────────────────┬─────────────────┘
           │ SSE (answers,       │ HTTP (commands,
           │  transcripts)       │  settings)
           ▼                     ▼
┌──────────────────────────────────────────────────┐
│            Python Backend (FastAPI)               │
│                                                   │
│  Audio Capture ──→ Whisper/Deepgram ──→ Question │
│  (sounddevice)     (transcription)     Detector  │
│                                           │      │
│  Screenshot ──→ Vision Model              ▼      │
│  (Pillow)       (GPT-4o)          LLM Streaming  │
│                                   (GPT-4o/Claude)│
│                                        │         │
│  Context Manager ◄────────────────────►│         │
│  (conversation history)          SSE → Overlay   │
└──────────────────────────────────────────────────┘

Prerequisites

macOS 13+
Python 3.11+
Node.js 18+
BlackHole 2ch — virtual audio driver for system sound capture
portaudio — audio backend for sounddevice

brew install blackhole-2ch portaudio node

BlackHole Setup (required for interviewer audio)

Open Audio MIDI Setup (Spotlight → "Audio MIDI Setup")
Click "+" → Create Multi-Output Device
Check: Built-in Output + BlackHole 2ch
Right-click → Use This Device For Sound Output
Verify: audio plays normally through speakers

Without BlackHole the app still works in mic-only mode — it captures your microphone and you trigger answers manually with F5.

Installation

# Clone the repo
git clone https://github.com/nkonshin/AxelAiAssistant.git
cd AxelAiAssistant

# Create .env with your API keys
cp .env.example .env
# Edit .env and add your keys

# Run setup (installs Python venv, npm packages, downloads Whisper model)
./scripts/setup.sh

Environment Variables

# Required
OPENAI_API_KEY=sk-...          # OpenAI API key

# Optional — Deepgram (only if using cloud transcription)
DEEPGRAM_API_KEY=...           # Deepgram API key

# Optional — Claude via CLIProxyAPI (Max subscription)
CLI_PROXY_URL=http://localhost:8317/v1
CLI_PROXY_API_KEY=your-api-key-1

# Transcription (default: whisper)
TRANSCRIPTION_PROVIDER=whisper  # "whisper" or "deepgram"
WHISPER_MODEL=large-v3-turbo    # tiny, base, small, medium, large-v3, large-v3-turbo

# LLM (default: openai / gpt-4o-mini)
LLM_PROVIDER=openai             # "openai" or "claude"
LLM_MODEL=gpt-4o-mini

Usage

# Start everything (backend + overlay)
./scripts/dev.sh

Or run manually:

# Terminal 1: Python backend
cd backend && source .venv/bin/activate && python main.py

# Terminal 2: Electron overlay
cd overlay && npm run dev

Background Launch (no terminal window)

# Option A: nohup (close terminal after)
nohup /path/to/AxelAiAssistant/scripts/dev.sh > /dev/null 2>&1 &

# Option B: AppleScript launcher (in ~/Documents/AxelAssistant.app)
# Double-click to launch, no terminal window at all

Stop background processes:

pkill -f "python3 main.py"; pkill -f "electron-vite"

Hotkeys

Shortcut	Action
`Cmd+Shift+\`	Show / hide overlay
`Cmd+Shift+M`	Start / stop recording
`Cmd+Shift+A`	Force answer (send all buffers to LLM)
`F5`	Send mic buffer to LLM (candidate's speech)
`Cmd+Shift+S`	Screenshot → AI Vision analysis
`Cmd+Shift+C`	Copy last answer to clipboard
`Cmd+Shift+I`	Toggle manual input bar
`Cmd+Shift+T`	Toggle click-through mode
`Cmd+Shift+↑`	Increase opacity
`Cmd+Shift+↓`	Decrease opacity

How It Works

Start recording (Cmd+Shift+M) — captures mic + system audio simultaneously
Interviewer speaks — system audio (BlackHole) is transcribed and buffered
3s pause detected — auto-triggers LLM to generate an answer
Answer streams into the overlay with markdown rendering
You speak — mic audio is transcribed but does NOT auto-trigger (prevents echo-answers)
F5 to send mic — manually sends your speech buffer to LLM if needed
Screenshot (Cmd+Shift+S) — captures screen for code/task analysis via Vision model

LLM Providers

Provider	Models	Setup
OpenAI	gpt-4o-mini, gpt-5-mini, gpt-5-nano	`OPENAI_API_KEY` in `.env`
Claude	claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5	CLIProxyAPI + Claude Max subscription

Switch providers at runtime via Settings panel in the overlay.

Transcription Providers

Provider	How	Pros / Cons
Whisper (default)	Local, pywhispercpp (whisper.cpp + Metal GPU)	No API key needed, private. ~0.5s inference per chunk
Deepgram	Cloud, Nova-3 WebSocket streaming	Lower latency, requires API key

API Endpoints

Method	Path	Description
GET	`/health`	Health check
GET	`/stream`	SSE event stream (transcripts, answers, status)
GET	`/status`	Current app state
GET	`/transcript`	Full transcript log
POST	`/start`	Start audio capture + transcription
POST	`/stop`	Stop recording
POST	`/screenshot`	Capture screen → Vision AI analysis
POST	`/force-answer`	Force answer from all buffers
POST	`/trigger-mic`	Send mic buffer to LLM (F5)
POST	`/ask`	Submit manual text question
GET/POST	`/settings/llm`	Get/set LLM provider and model
GET/POST	`/settings/transcription`	Get/set transcription provider and model
GET/POST	`/settings/profile`	Get/set candidate profile
POST	`/settings/profile/upload`	Upload resume (PDF/DOC/DOCX)
POST	`/settings/profile/reset`	Reset profile to template
GET/POST	`/settings/job`	Get/set job description
POST	`/settings/job/upload`	Upload job description file
POST	`/settings/job/reset`	Reset job description to template

Project Structure

AxelAiAssistant/
├── backend/
│   ├── main.py                  # FastAPI app, SSE, audio pipeline
│   ├── config.py                # Environment config
│   ├── audio_capture.py         # Mic + BlackHole audio capture
│   ├── transcription.py         # Deepgram WebSocket client
│   ├── transcription_whisper.py # Local Whisper (pywhispercpp)
│   ├── question_detector.py     # Pause-based auto-trigger + mic separation
│   ├── llm_client.py            # OpenAI + Claude streaming wrapper
│   ├── screenshot.py            # Screen capture (screencapture CLI + Pillow)
│   ├── context_manager.py       # Conversation history
│   ├── profile.md               # Candidate profile (edit before interview)
│   └── job_description.md       # Job description (edit before interview)
├── overlay/
│   ├── src/
│   │   ├── App.tsx              # Root component
│   │   ├── components/
│   │   │   ├── TopBar.tsx       # Status bar (recording, connection)
│   │   │   ├── AnswerView.tsx   # Scrollable chat-style Q&A view
│   │   │   ├── Transcript.tsx   # Compact live transcription (2 lines)
│   │   │   ├── InputBar.tsx     # Manual question input (hidden by default)
│   │   │   └── SettingsPanel.tsx # LLM/transcription/profile settings
│   │   ├── hooks/
│   │   │   ├── useSSE.ts        # SSE connection to backend
│   │   │   └── useHotkeys.ts    # Hotkey handler via IPC
│   │   └── styles/globals.css   # Tailwind + custom overlay styles
│   └── src/main/index.ts        # Electron main process (stealth, hotkeys)
├── scripts/
│   ├── dev.sh                   # Start backend + overlay (dev mode)
│   ├── setup.sh                 # Install all dependencies
│   └── build.sh                 # Build .app + .dmg
├── docs/
│   ├── ARCHITECTURE.md          # Detailed architecture docs
│   └── BUGFIX_NOTES.md          # Debug session notes
└── CLAUDE.md                    # AI assistant instructions

Troubleshooting

No sound from BlackHole: Make sure Multi-Output Device is set as default output in System Settings → Sound.

"BlackHole not found" error: Install with brew install blackhole-2ch and restart. App works without it in mic-only mode.

Overlay visible in screen share: Verify the app uses setContentProtection(true). On macOS 15+ (Sequoia) some apps may use ScreenCaptureKit which can bypass protection — tested OK with Zoom, Google Meet, and Yandex Telemost.

Screenshot permission denied: Add the app (or Terminal) to System Settings → Privacy & Security → Screen Recording. Restart the app after granting permission.

Whisper hallucinations ("Продолжение следует...", "Субтитры сделал..."): These are auto-filtered. If new patterns appear, add regex to transcription_whisper.py.

Missing API keys: Create .env from .env.example. Only OPENAI_API_KEY is required — Whisper runs locally without any API key.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
backend		backend
docs		docs
overlay		overlay
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.ru.md		README.ru.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Axel AI Assistant

Features

Architecture

Prerequisites

BlackHole Setup (required for interviewer audio)

Installation

Environment Variables

Usage

Background Launch (no terminal window)

Hotkeys

How It Works

LLM Providers

Transcription Providers

API Endpoints

Project Structure

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Axel AI Assistant

Features

Architecture

Prerequisites

BlackHole Setup (required for interviewer audio)

Installation

Environment Variables

Usage

Background Launch (no terminal window)

Hotkeys

How It Works

LLM Providers

Transcription Providers

API Endpoints

Project Structure

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages