Gallium is a prototype of a visual agent editor and runtime. It consists of a Python backend that handles execution and a web frontend for editing and interaction. The core idea is that agents and llm development in general seem to be borrowing/following the development path that video game AI followed. With early logic being static / hardcoded all in the LLM training and now we're seeing a push towards simple logic loops and state tracking external to the core evaluation. My thinking is that applying some more modern game AI logic like GOAP, Utility AI, and Behavior Trees to the LLM space could lead to more robust and predictable agents. This would be a harness layer on top of the LLM evaluations. So this tool is my quick prototype to test out the idea, starting simple with behaviour state machines built somewhat how one might make a video game AI state machine. For the spirit of the hackathon I've done all of the development using google gemini 3 flash and pro. With very minor human intervention at the source code level. If you dig through the git history, you can find where I restarted the project twice as I wasn't sure what I wanted to build when I started last week.
Disclamer: This isn't intended on being a fully finished and polished project. This is simply to express a fully working idea, to get it into the hands of people and let them experiment and figure out what works for them. Its all open sourced and free so have fun.
For all of my projects privately and with my friends we pick a placeholder name as soon as possible for a project that doesn't have to mean anything as far as the project goes. This allows easily discussing it as a object without having to describe it fully to make sure we're on the same page. We pick plant species, elements, or any name really. For this paticular one I took a quick look at the periodic table and looked for a metal we haven't used.
Gallium allows users to:
- Define Agents: Create AI agents with specific roles and state machines.
- Build Workflows: Connect multiple agents to work together on tasks.
- Visual Programming: Use a node-based editor to define logic, function calls, and LLM interactions.
- Simulate & Chat: Chat with running workflows in real-time.
The project has two main parts:
- Backend: A Python engine that manages state, runs graphs, and talks to LLMs.
- Frontend: A web application for the visual editors and chat interface.
- Communication: They talk to each other via WebSockets.
- User Interaction: The user does something in the web app (like sending a message).
- WebSocket Event: The frontend sends a JSON message to
web_server.py. - Message Handling:
message_handler.pysends the command toSimulationState. - State Update:
SimulationStateprocesses the request and might tellGraphInterpreterto run a graph. - Graph Execution:
GraphInterpretergoes through the nodes and executes logic (like calling an LLM). - Event Stream: Updates are sent back to the frontend so the user can see what's happening.
Agents are defined as state machines. Each agent has:
- States: Different modes they can be in (e.g., "Planning", "Executing") all user driven.
- Transitions: Evaluated condition expressions for moving between states.
- Functions: Logic that runs "ticks" when in a specific state.
Logic is defined using visual "Functions". These are graphs where:
- Nodes: Represent actions (e.g., "Send LLM Message", "Set Variable", "Greater Than").
- Connections: Show the order of execution.
- The
GraphInterpreterruns these graphs step-by-step.
- Workflow: High level which agent state machine to use when starting a thread, and what "roles" should go to which model or provider.
- Thread: A running instance of a workflow. It has its own memory and message history.
The SimulationState class tracks everything. It manages:
- Active threads and their memory.
- Global variables ("Blackboard").
- The event log.
- The execution tick counter.
| File | Description |
|---|---|
main.py |
Starts the SimulationState and the web server. |
simulation_state.py |
The main engine. Manages ticks, threads, and events. |
graph_interpreter.py |
Runs the node graphs. Handles variables and flow control. |
web_server.py |
Web server that handles WebSocket connections and serves files. |
function_manager.py |
Saves and loads agent and function files. |
message_handler.py |
Routes messages from the frontend to the backend. |
local_llm.py |
Client for local LLMs (like llama.cpp). |
gemini_llm.py |
Client for the Gemini API. |
blackboard.py |
Shared key-value store for global variables. |
struct_manager.py |
Manages custom data structures. |
| Directory/File | Description |
|---|---|
index.html |
The main HTML file. |
js/app.js |
Main frontend logic. Handles connection and UI updates. |
node_editor/ |
Code for the visual graph editor. |
agent_editor/ |
Code for the agent state machine editor. |
css/ |
Styles for the application. |
- Visual Graph Editor: Build logic flows visually.
- LLM Support: Connect to OpenAI, Anthropic, Gemini, or local models.
- Real-time Debugging: See events and logs as they happen.
- Workflow Management: Save and load different agent setups.
- Memory: Agents remember context during a thread.
This project was primarly built on NixOS using a development flake for the dependices.
Its just a python project, I'm sure if you pointed your favorite LLM at the repo it could get a venv or uv setup for you to run it.
Use nix develop --command to enter the shell
Or run run.sh project_path_to_work_in to start the server and open the localhost url in your browser. Make sure you start from within the gallium folder, so the graphs folder can be found with all of the default graphs I've made.
Defaults to port 8081
At the moment all of the providers are implemented, but only Local and Google are tested.
The keys are saved as plaintext in the gallium/connections.json file.
First off here is the landing when you load up the server

The very last tab at the top middle you can see is the LLM Connections tab.
Setup whatever llm connections you want to use.

Here on the second tab, workflows, we can see that its got simple descriptions of workflow names, the primary "router" agent which you can think of as the entry point, and a bunch of Workflow Roles with names, provider and optional model tags.

Next up is the actual implementation of the router agent which is the Agent Editor tab.
You can see that its a simple finite state machine with a green Start node and several other nodes branching off of it to form a loop.
Heres a slightly more complex one as an idea of what I'm envisioning in the future as workflows get more complex:

Heres the logic that backs the implement stage of the agent. Its basically a ralph loop tick, injest the plan, look for work, do it, then close. It always starts with a fresh empty context.

Heres the type database window open to the struct editor.
User structs with fields set to whatever type you need to store

Same as before, but this is the enum editor so you can set string enum constants.

In the context menu when you hover over a function it gives a tooltip for what the function is, description, inputs outputs. Here we're looking at the llm eval node which is the single shot send and one response mode (with tools supported, including tools created as graph functions)

When you open the context menu the search box is focused and typing will search for whatever function name, or tag is close to what your looking for, hitting enter spawns the top one.

Another useful node, lets you run sub processes. can be useful for hooking up some subprocess you have as a single shot tool for the llm to tool call.

This node was a late addition, usage idea would be to shoot out a web request to some other service you have to injest into the tool call, or maybe just to send a email to yourself notifying the work as done.
