Zero Entropy Temporal Assimilation (v0)
git clone https://github.com/H-XX-D/ZetaZero.git
cd ZetaZero
./quickstart.shOr with Docker directly:
docker run -d -p 8080:8080 \
-v ~/models:/models \
-v ~/.zetazero:/storage \
ghcr.io/h-xx-d/zetazero:latestWant to tweak settings later? Run ./quickstart.sh --unlock to disable password protection on config changes.
Z.E.T.A. Zero inverts the current dogma that More Parameters = More Intelligence.
Current LLMs are structurally stateless. They spend massive amounts of energy computing a "thought," only to discard that thought into entropy the moment the token is generated. They recompute the entire world model for every single exchange.
Understanding what Z.E.T.A. answers simple questions:
-
Why waste the compute? If a thought is computed once, it should be persisted, not discarded.
-
Why limit context to VRAM? Memory should be an explicit graph, not an implicit buffer.
-
**If there is a better faster less energy intensive way for AI to operate?**It is gross negligence by humanity to contiue to pollute and waste valueable resources if the technology exists that is 11x more energy efficent and 4.3x faster with a software download.
-
What would an AI dream up while you're dreaming too?
Z.E.T.A. is not a model. It is a Framework for Cognitive Constructs.
Real 50-turn conversation with facts, retrieval questions, and general knowledge mixed together.
Turn 50: Standard LLM takes 16.7s and 2,395 Ws. Z.E.T.A. takes 3.6s and 216 Ws.
That's 4.6x faster and 11x less energy.
| Scale | Energy Saved | CO₂ Avoided | Equivalent |
|---|---|---|---|
| 1M conversations/day | 1,944 kWh/day | 284 tons CO₂/year | 62 cars off the road |
| 10M conversations/day | 19,440 kWh/day | 2,840 tons CO₂/year | 620 cars off the road |
| 100M conversations/day | 194,400 kWh/day | 28,400 tons CO₂/year | 6,000 cars off the road |
Based on US grid average of 0.4 kg CO₂/kWh. Savings calculated from 7,000 Ws per 50-turn conversation.
Benchmark Methodology
Hardware:
- GPU: NVIDIA RTX 5060 Ti 16GB
- System: HP Z6 Gen 4 24-core Xeon Gold 32GB DDR4 Ram 4TB Nvme M.2
- Idle power: ~20W
Test Setup:
- Model: Qwen2.5 14B (Q4_K_M quantization)
- 50-turn realistic conversation with mixed content
- 35 fact statements, 12 retrieval questions, 3 general knowledge
- Max tokens: 100, Temperature: 0
- 2-second pause between queries
Growing Context (Standard LLM): Each turn accumulates full conversation history. Turn N sends N prior exchanges + new question. Context grows linearly, reprocessed every turn.
Fresh Query (Z.E.T.A.): Each query sent independently. Prior context stored in graph, retrieved via embedding similarity—not reprocessed as raw tokens.
Measurements:
- Time:
date +%s.%Nbefore/after curl request - Power:
nvidia-smi --query-gpu=power.drawafter response - Energy: Peak power × response time (Watt-seconds)
Raw data: benchmarks/50_turn_realworld.json
Three models, one cognitive loop:
| Role | Why |
|---|---|
| Reasoning (14B) | Complex planning, analysis, multi-step thought |
| Coding (7B) | Fast code generation, syntax, execution |
| Memory (Embed) | Semantic search, graph retrieval, similarity |
The 14B thinks. The 7B executes. The embedder remembers.
They share a persistent knowledge graph—not a context window. When one model learns something, the others can retrieve it. When the 14B reasons through a problem, that reasoning is stored, not discarded.
When Z.E.T.A. has no active queries, it doesn't just sit there. It dreams.
- Memory Consolidation — Prunes weak connections, strengthens frequently-accessed paths
- Temperature Cranked — Sampling goes high. Creative mode, not precise-answer mode
- Codebase Wandering — Walks your indexed files making unexpected connections
- Outputs to
dreams/—code_fix,code_idea,insight
Nobody asked for this. The model dreamed it:
"Code Symphony" — Map internal operations to sound. Arithmetic → rhythmic beats. Conditionals → melodies. Let users hear their code execute. An interactive auditory interface where you trigger functions and hear how they affect the generated soundscape...
That emerged from high-temperature free-association across a codebase—connecting audio processing patterns to execution flow to UI feedback—because that's what happens when you let a model wander with the reins loose.
Some dreams are noise. Some are "why didn't I see that?"
How do you control something that has the potential to become uncontrollable before you can react?
You make its ethics hardcoded to its cognition. Not a system prompt that can be jailbroken. Not a filter that can be bypassed. The constitution is cryptographically bound to the weights themselves:
typedef struct {
uint8_t hash[32]; // SHA-256 of constitution text
uint64_t seed; // PRNG seed derived from hash
bool verified; // True only if constitution matches
} zeta_constitution_t;
// 1. Hash the constitution → 256-bit key
// 2. Key seeds the PRNG for weight permutation
// 3. Weights are STORED permuted — wrong key = garbage output
void zeta_generate_permutation(
const zeta_constitution_t* ctx, // Contains the hash
int* permutation_out, // Shuffle order for weights
int n
);The model cannot function without the correct constitution present. Change the ethics, the weights become noise. It governs itself or lobotomy
→ zeta-constitution.h
→ THE_SILICON_ACCORD.txt
