Skip to content

VismayVora/VoiceOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceOS Computer Use (for Mac)

VoiceOS Computer Use is a powerful tool that runs natively on macOS to provide direct system control through native macOS commands and utilities, now featuring advanced Gesture Control.

Features

  • Gesture Control: Control the assistant with hand gestures via your webcam.
  • Voice Control: High-quality Neural TTS.
  • Headless Mode: Run entirely from the terminal without a browser.
  • Native macOS GUI interaction: No Docker required.
  • Screen capture: Using native macOS commands.
  • Keyboard and mouse control: Through cliclick.
  • Multiple LLM provider support: Anthropic, Bedrock, Vertex.

Prerequisites

  • macOS Sonoma 15.7 or later
  • Python 3.12+
  • Homebrew (for installing additional dependencies)
  • cliclick (brew install cliclick) - Required for mouse/keyboard control
  • portaudio (brew install portaudio) - Required for microphone access

Setup Instructions

  1. Clone the repository and navigate to it:
git clone https://github.com/VismayVora/VoiceOS.git
cd VoiceOS
  1. Create and activate a virtual environment:
python3.12 -m venv venv
source venv/bin/activate
  1. Install Python requirements:
pip install -r requirements.txt

Configuration

  1. In a .env file add:
API_PROVIDER=anthropic
ANTHROPIC_API_KEY=<key>
WIDTH=800
HEIGHT=600
DISPLAY_NUM=1

Running the App

Run the gesture-controlled assistant:

source venv/bin/activate
python gesture_control.py

Gesture Controls

The system uses your webcam to detect hand gestures:

  • ✋ Open Palm: Start Listening (The assistant will start listening for your voice command)
  • ✊ Closed Fist: Stop Listening (Finish your command)
  • ✌️ Victory (Peace Sign): Reset History (Clear the conversation context)

Voice Interaction

Once listening (Open Palm), say your command. For example:

  • "Open Safari and search for weather in New York"
  • "Take a screenshot"
  • "Close the calculator"

Screen Size Considerations

We recommend using one of these resolutions for optimal performance:

  • XGA: 1024x768 (4:3)
  • WXGA: 1280x800 (16:10)
  • FWXGA: 1366x768 (~16:9)

Higher resolutions will be automatically scaled down to these targets to optimize model performance.

Acknowledgements

This project builds upon the excellent work in mac_computer_use by deedy.

While the core computer use logic is derived from the original repository, this repository introduces significant novel contributions:

  • Latest Anthropic Integration: Updated to support the latest Anthropic models (Claude 3.5 Sonnet v2) and API structures (beta 2025-01-24).
  • Gesture Control: A completely new camera-based hand gesture recognition system for hands-free interaction.
  • Voice Integration: Integration of high-quality Neural TTS and Whisper-based STT for natural voice conversations.
  • Headless Operation: Optimized for running without a visible UI overlay, suitable for background operation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages