A proactive, context-aware AI desktop assistant that continuously listens for speech, watches your screen, and generates intelligent responses.
- 🎤 Voice Recognition - Powered by OpenAI Whisper for accurate speech-to-text
- 👁️ Screen Awareness - OCR-based screen reading and context understanding
- 🧠 Smart AI - Google Gemini integration for intelligent responses
- ⚡ Real-time Processing - Fast response times with optimized inference
- 🎯 Intent Detection - Natural language understanding for various tasks
- 💬 Interactive UI - Beautiful popup windows with actionable responses
- 📊 Chart Analysis - Specialized understanding of data visualizations
- Python 3.8 or higher
- macOS (tested) / Linux / Windows
- Google Gemini API key (free) or any other LLM API key / any local llm you can run
- Microphone access
- Clone the repository
git clone https://github.com/ry2009/cluely4Free.git
cd cluely4Free- Run the automated setup
python setup.py- Set your API key
export GEMINI_API_KEY="your_api_key_here"- Start Cluely
python main.py- "Hey Cluely, what's on my screen?" - Analyzes current screen content
- "Hey Cluely, explain this chart" - Detailed chart and data analysis
- "Hey Cluely, summarize this article" - Web content summarization
- "Hey Cluely, help me with this code" - Programming assistance
- Development: VS Code, Cursor, Terminal, Xcode
- Web Browsing: Chrome, Safari, Firefox
- Writing: Word, Google Docs, Notion, Obsidian
- Communication: Mail, Gmail, Outlook
- Social Media: Twitter/X, LinkedIn
cluely/
├── audio/ # Speech recognition & processing
├── vision/ # Screen capture & OCR
├── brain/ # Intent routing & prompt building
├── llm/ # AI model interfaces (Gemini, local models)
├── utils/ # Configuration & performance monitoring
└── main.py # Application entry point
Customize Cluely via cluely_config.json:
{
"audio": {
"listen_duration": 3,
"silence_threshold": 0.01
},
"llm": {
"max_tokens": 1000,
"temperature": 0.9
},
"ui": {
"auto_dismiss_time": 15
}
}Run the comprehensive test suite:
python main.py testTests include:
- ✅ Environment validation
- ✅ Microphone functionality
- ✅ Screen capture & OCR
- ✅ Visual parsing
- ✅ LLM connectivity
- Response Time: ~3-4 seconds from speech to AI response
- Audio Processing: OpenAI Whisper (base model)
- AI Inference: Google Gemini 1.5 Flash (optimized for speed)
- Screen Analysis: Tesseract OCR with intelligent filtering
- Local Processing: Audio and screen data processed locally
- API Calls: Only text prompts sent to Gemini (no audio/images)
- No Storage: No conversation history or personal data stored
- Secure: Environment variables for API keys
- Context-aware responses based on active application
- Visual content analysis (charts, graphs, documents)
- Intelligent intent detection from natural speech
- Modular design for easy feature additions
- Support for multiple LLM backends
- Configurable response types and UI behaviors
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for speech recognition
- Google Gemini for AI inference
- Tesseract OCR for screen text extraction
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Email: For private inquiries
Made with suggestions from cluely to cursor directly & by ry2009
Cluely - Your intelligent desktop companion