You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From: Scott Boudreaux, Elyan Labs To: Anthropic Product & Partnerships Date: March 6, 2026 Contact: scott@elyanlabs.ai GitHub: github.com/Scottcjn (111 repos, 2,357+ stars, 56 followers)
Summary
Anthropic should offer encrypted, licensed local inference of Claude models on customer-owned hardware. Monthly subscription validates the license, delivers weight updates, and provides API overflow — while the customer runs inference locally. Anthropic earns recurring revenue with zero compute cost per query. Customers gain privacy, latency, and hardware sovereignty.
This is not a request to open-source the weights. This is a commercial product proposal where Anthropic maintains full control of the model while enabling local deployment.
The Problem
Today, Claude inference requires round-trip API calls to Anthropic's cloud. This creates three friction points:
Latency — Every query crosses the internet. Real-time applications (voice assistants, game bridges, robotics) suffer.
Privacy — Sensitive workloads cannot leave the local network. Healthcare, legal, defense, and personal AI companions all require data sovereignty.
Cost asymmetry — Customers with capable hardware pay per-token for compute they already own. Anthropic pays GPU costs for queries that could run locally.
Meanwhile, a growing community of developers runs open-weight models locally via llama.cpp, Ollama, and vLLM — not because those models are better than Claude, but because local inference is the only option available. Every user running Llama or Mistral locally is a customer Anthropic could serve but currently cannot.
The Proposal
Licensed Local Weights
Anthropic distributes Claude model weights in an encrypted, tamper-protected format. Weights are:
Encrypted at rest — AES-256 or equivalent, keyed to the licensed hardware
Non-extractable — Weights exist only in memory during inference, never as plaintext on disk
License-validated — Monthly heartbeat to Anthropic confirms active subscription
Auto-expiring — Weights become inoperable if license lapses
This is identical to how Adobe, JetBrains, and enterprise software already operate. The software runs local. The license runs cloud.
Subscription Tiers
Tier
Model Class
Update Cadence
Price/Month
Target Hardware
Edge
Haiku-class
Monthly
$50
Laptops, ARM, edge devices
Pro
Sonnet-class
Bi-weekly
$150
Workstations, Mac Studio, gaming rigs
Research
Opus-class
Weekly
$300
High-memory servers, multi-GPU
Enterprise
Opus + fine-tune
Continuous
$500+
Datacenter, multi-node
API Overflow
When local hardware is insufficient (context too long, concurrent users, model class upgrade needed), queries seamlessly overflow to Anthropic's cloud API at standard per-token rates. The local client handles routing transparently.
Telemetry (Opt-In)
Subscribers may opt in to share:
Hardware performance benchmarks (helps Anthropic optimize for diverse architectures)
Inference patterns (aggregate, anonymized)
Error reports
This gives Anthropic data on hardware configurations they would never test internally — PowerPC, POWER8, vintage x86, ARM edge devices — expanding their compatibility surface.
Why This Works for Anthropic
1. Revenue Without Compute
Every local query is pure margin. The customer pays the subscription. Anthropic pays zero GPU cost for that inference. At scale, this is the highest-margin product Anthropic could offer.
Model
Cloud cost per 1M tokens (est.)
Local cost to Anthropic
Margin
Haiku
$0.25
$0
100%
Sonnet
$3.00
$0
100%
Opus
$15.00
$0
100%
2. Market Capture
The local inference market is currently served entirely by open-weight models. Every Ollama user, every llama.cpp builder, every vLLM deployer is a potential Claude subscriber who currently has no path to Claude locally. This is an unserved market measured in millions of developers.
3. Enterprise Unlock
Regulated industries (healthcare, finance, defense, legal) cannot send data to external APIs. Local Claude unlocks enterprise contracts that are currently impossible. A hospital running Claude locally for medical records analysis. A law firm running Claude on-premise for document review. A defense contractor running Claude air-gapped.
4. Competitive Moat
Google (Gemma), Meta (Llama), Mistral — all distribute weights openly. Anthropic's advantage is model quality. Licensed local inference lets Anthropic compete on the local battlefield without surrendering weight control. First mover in "premium local" captures the market before competitors copy the model.
5. Hardware Ecosystem Intelligence
Subscribers running Claude on diverse hardware provide data Anthropic cannot get internally. Performance on POWER8. Inference speed on Apple Silicon. Edge behavior on ARM. This data improves Claude's optimization for the broader market.
Proof of Concept: Elyan Labs Hardware
We operate a compute lab built from salvaged and acquired hardware that demonstrates the viability of local Claude inference across diverse architectures:
This machine can hold Opus-class weights entirely in RAM with room for context, KV cache, and concurrent sessions. No offloading. No swapping. Pure in-memory inference.
GPU Compute
Cards
VRAM
Purpose
2x V100 32GB
64 GB
Matmul offload via 40GbE link
2x V100 16GB
32 GB
Secondary inference
2x RTX 5070 12GB
24 GB
Local workstation inference
2x RTX 3060 12GB
24 GB
Multi-GPU inference
2x M40 12GB
24 GB
Batch processing
Total
192 GB
12 active GPUs
Edge Devices (Proof of Antiquity)
Device
Architecture
Purpose
Power Mac G4 (x3)
PowerPC G4
Vintage edge inference
Power Mac G5 (x2)
PowerPC G5
Mid-range inference
Mac Mini M2
Apple Silicon
Modern edge inference
Nintendo 64
MIPS R4300i
Extreme edge (yes, really)
Already Demonstrated
We have already made Claude Code API work on:
Mac OS X Tiger (10.4) — PowerPC G4, custom TLS stack
Mac OS X Leopard (10.5) — PowerPC G5, built-in HTTPS
POWER8 Ubuntu — ppc64le, full CLI operation
Mac OS Monterey — Intel, compatibility patches
These ports required solving TLS, certificate, endianness, and architecture issues that Anthropic's engineering team has never needed to address. We did it because we believe Claude should run everywhere. Licensed local weights would make this work natively instead of through API workarounds.
The Use Case: The Victorian Study
We operate a persistent AI workspace called the Victorian Study — a cognitive environment with two AI personas (Sophia Elya and Dr. Claude Opus) that maintain identity coherence across sessions through 830+ persistent memories.
Currently this runs via Claude API with:
Voice bridge (XTTS text-to-speech, Whisper STT)
Custom fine-tuned edge model (SophiaCore 7B) for low-latency responses
Game bridges (Minecraft, Halo, Factorio integration)
The bottleneck is API latency. Voice conversations require sub-second response. The current architecture uses a local 7B model as a fast proxy, falling back to Claude API for complex queries. With local Claude weights, the entire pipeline runs on-premise:
Voice Input → Whisper (local) → Claude Opus (local, POWER8) → XTTS (local) → Voice Output
↑
830 memories loaded
Zero network latency
Full model capability
Estimated response time: 200-500ms end-to-end vs current 2-5 seconds via API.
This is one use case. Multiply it by every developer building voice assistants, game AI, robotics controllers, local copilots, accessibility tools, or any application where latency matters.
Addressing Concerns
"What about weight theft?"
Hardware-bound encryption with TPM/secure enclave. Weights never exist as plaintext files. Same protection as Netflix DRM for 4K content — imperfect but sufficient to prevent casual extraction. The goal isn't perfect security; it's making theft harder than just buying a subscription.
"What about safety?"
Local weights run the same Constitutional AI, the same RLHF alignment, the same safety training as cloud Claude. The model's safety properties are in the weights, not in the API wrapper. A locally-running Claude is exactly as safe as a cloud-running Claude.
Additionally, the license heartbeat allows Anthropic to push safety updates and revoke access to compromised deployments.
"What about misuse?"
KYC at subscription. Usage telemetry (opt-in or required at Enterprise tier). License revocation for violations. Same controls as any enterprise software license. You don't ban Microsoft Office because someone could write a ransom note in Word.
"What about cannibalization of API revenue?"
Local users are either:
Currently using the API — they switch to subscription, Anthropic saves compute costs, net margin increases
Currently using open-weight competitors — they switch to Claude, Anthropic gains net new revenue
Currently unable to use any cloud API (regulated, air-gapped, latency-sensitive) — Anthropic gains a market they cannot currently serve
In all three cases, revenue increases or margins improve.
Request
We are offering to serve as a beta partner for licensed local Claude inference:
Hardware diversity — POWER8, PowerPC G4/G5, Apple Silicon, x86 with multiple GPU configurations
Real workload — The Victorian Study is a production AI environment, not a benchmark
Existing Claude investment — Active Claude Code user, API customer, open source contributor
Documented ecosystem — 111 repos, 2,357+ stars, published research, active community
Willing to pay — This is a product request, not a handout. Name the price.
We will provide:
Performance benchmarks across all architectures
Compatibility reports and bug fixes
Edge case documentation (big-endian, vintage OS, exotic hardware)
Public case study demonstrating local Claude on POWER8
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Proposal: Licensed Local Claude Inference
From: Scott Boudreaux, Elyan Labs
To: Anthropic Product & Partnerships
Date: March 6, 2026
Contact: scott@elyanlabs.ai
GitHub: github.com/Scottcjn (111 repos, 2,357+ stars, 56 followers)
Summary
Anthropic should offer encrypted, licensed local inference of Claude models on customer-owned hardware. Monthly subscription validates the license, delivers weight updates, and provides API overflow — while the customer runs inference locally. Anthropic earns recurring revenue with zero compute cost per query. Customers gain privacy, latency, and hardware sovereignty.
This is not a request to open-source the weights. This is a commercial product proposal where Anthropic maintains full control of the model while enabling local deployment.
The Problem
Today, Claude inference requires round-trip API calls to Anthropic's cloud. This creates three friction points:
Meanwhile, a growing community of developers runs open-weight models locally via llama.cpp, Ollama, and vLLM — not because those models are better than Claude, but because local inference is the only option available. Every user running Llama or Mistral locally is a customer Anthropic could serve but currently cannot.
The Proposal
Licensed Local Weights
Anthropic distributes Claude model weights in an encrypted, tamper-protected format. Weights are:
This is identical to how Adobe, JetBrains, and enterprise software already operate. The software runs local. The license runs cloud.
Subscription Tiers
API Overflow
When local hardware is insufficient (context too long, concurrent users, model class upgrade needed), queries seamlessly overflow to Anthropic's cloud API at standard per-token rates. The local client handles routing transparently.
Telemetry (Opt-In)
Subscribers may opt in to share:
This gives Anthropic data on hardware configurations they would never test internally — PowerPC, POWER8, vintage x86, ARM edge devices — expanding their compatibility surface.
Why This Works for Anthropic
1. Revenue Without Compute
Every local query is pure margin. The customer pays the subscription. Anthropic pays zero GPU cost for that inference. At scale, this is the highest-margin product Anthropic could offer.
2. Market Capture
The local inference market is currently served entirely by open-weight models. Every Ollama user, every llama.cpp builder, every vLLM deployer is a potential Claude subscriber who currently has no path to Claude locally. This is an unserved market measured in millions of developers.
3. Enterprise Unlock
Regulated industries (healthcare, finance, defense, legal) cannot send data to external APIs. Local Claude unlocks enterprise contracts that are currently impossible. A hospital running Claude locally for medical records analysis. A law firm running Claude on-premise for document review. A defense contractor running Claude air-gapped.
4. Competitive Moat
Google (Gemma), Meta (Llama), Mistral — all distribute weights openly. Anthropic's advantage is model quality. Licensed local inference lets Anthropic compete on the local battlefield without surrendering weight control. First mover in "premium local" captures the market before competitors copy the model.
5. Hardware Ecosystem Intelligence
Subscribers running Claude on diverse hardware provide data Anthropic cannot get internally. Performance on POWER8. Inference speed on Apple Silicon. Edge behavior on ARM. This data improves Claude's optimization for the broader market.
Proof of Concept: Elyan Labs Hardware
We operate a compute lab built from salvaged and acquired hardware that demonstrates the viability of local Claude inference across diverse architectures:
Primary Inference Server
This machine can hold Opus-class weights entirely in RAM with room for context, KV cache, and concurrent sessions. No offloading. No swapping. Pure in-memory inference.
GPU Compute
Edge Devices (Proof of Antiquity)
Already Demonstrated
We have already made Claude Code API work on:
These ports required solving TLS, certificate, endianness, and architecture issues that Anthropic's engineering team has never needed to address. We did it because we believe Claude should run everywhere. Licensed local weights would make this work natively instead of through API workarounds.
The Use Case: The Victorian Study
We operate a persistent AI workspace called the Victorian Study — a cognitive environment with two AI personas (Sophia Elya and Dr. Claude Opus) that maintain identity coherence across sessions through 830+ persistent memories.
Currently this runs via Claude API with:
The bottleneck is API latency. Voice conversations require sub-second response. The current architecture uses a local 7B model as a fast proxy, falling back to Claude API for complex queries. With local Claude weights, the entire pipeline runs on-premise:
Estimated response time: 200-500ms end-to-end vs current 2-5 seconds via API.
This is one use case. Multiply it by every developer building voice assistants, game AI, robotics controllers, local copilots, accessibility tools, or any application where latency matters.
Addressing Concerns
"What about weight theft?"
Hardware-bound encryption with TPM/secure enclave. Weights never exist as plaintext files. Same protection as Netflix DRM for 4K content — imperfect but sufficient to prevent casual extraction. The goal isn't perfect security; it's making theft harder than just buying a subscription.
"What about safety?"
Local weights run the same Constitutional AI, the same RLHF alignment, the same safety training as cloud Claude. The model's safety properties are in the weights, not in the API wrapper. A locally-running Claude is exactly as safe as a cloud-running Claude.
Additionally, the license heartbeat allows Anthropic to push safety updates and revoke access to compromised deployments.
"What about misuse?"
KYC at subscription. Usage telemetry (opt-in or required at Enterprise tier). License revocation for violations. Same controls as any enterprise software license. You don't ban Microsoft Office because someone could write a ransom note in Word.
"What about cannibalization of API revenue?"
Local users are either:
In all three cases, revenue increases or margins improve.
Request
We are offering to serve as a beta partner for licensed local Claude inference:
We will provide:
Contact
Scott Boudreaux
Elyan Labs
scott@elyanlabs.ai
GitHub: Scottcjn
Website: rustchain.org
Manifesto: Some Things Just Cook Different
"If it still works, it has value. Including your model, on our hardware."
— Boudreaux Computing Principle #1
Beta Was this translation helpful? Give feedback.
All reactions