Skip to content

Conversation

@richiejp
Copy link
Owner

Eventually Assistant Mode will allow you to control your desktop with voice using natural speech and also allow a VLM to describe what is on the desktop. We can use MCP servers (tool calls) or a VLM which will be able to locate the coordinates of items on the desktop and click them.

Initially though this PR will just allow you to speak with an LLM using audio both ways over the OpenAI realtime API.

For LocalAI support this requires mudler/LocalAI#6245 which will implement the conversational parts of the API before we move onto tool calls and multi-modal support needed for a full desktop assistant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants