LLM Vision OS

LLM Vision OS is a Python-based application that utilizes Gemini 1.5 Flash API for image analysis and and Google Speech for speech synthesis. When you click 'start' it will take screenshots every few seconds for Gemini Flash to analyze. You can ask questions and get answers related to anything you're viewing on the screen.

Features

Capture and analyze screenshots at specified intervals
Real-time speech recognition and synthesis
Export logs of analysis
Simple GUI

Prerequisites

Python 3.10 or later
Required Python packages (see requirements.txt)
Google Cloud account with Generative Language API and Cloud Text-to-Speech API both enabled
Google Cloud API set as environment variable with name GOOGLE_API_KEY

Note

Screenshots will be taken and processed every 2 seconds by default. This can be changed in the interface.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
osv.py		osv.py
osv_run.bat		osv_run.bat
osvg.py		osvg.py
osvg_run.bat		osvg_run.bat
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Vision OS

Features

Prerequisites

Note

About

Uh oh!

Releases

Packages

Uh oh!

Languages

fbader927/OS-assist

Folders and files

Latest commit

History

Repository files navigation

LLM Vision OS

Features

Prerequisites

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages