Skip to content

Commit d5664f2

Browse files
author
jovanSAPFIONEER
committed
v3.3.4 -- API architecture docs, demo launcher, code simplifications, untrack runtime blackboard
1 parent 1182add commit d5664f2

9 files changed

Lines changed: 334 additions & 266 deletions

File tree

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,8 @@ dist/
3333
# Test artifacts
3434
test-*.log
3535
security-audit.log
36+
37+
# Runtime state (auto-generated, changes every run)
38+
swarm-blackboard.md
39+
err.txt
40+
examples/output/

CHANGELOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,19 @@ All notable changes to Network-AI will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [3.3.4] - 2026-02-21
9+
10+
### Added
11+
- **API Architecture & Performance** section in README -- explains single-key rate limits, multi-key parallelism, local GPU setup, cloud provider comparison table, and `max_completion_tokens` guidance
12+
- **`run.ts` demo launcher** -- interactive menu to run any of the 5 examples via `npx ts-node run.ts`
13+
14+
### Changed
15+
- `tsconfig.json` -- exclude `examples/output/` and `**/fixed-*.ts` from compilation
16+
17+
### Fixed
18+
- `SharedBlackboard.validateValue` -- removed redundant `undefined` pre-check; `JSON.stringify` try/catch handles all unsupported types correctly
19+
- `TaskDecomposer` -- simplified task result caching; removed duplicate failure propagation block that shadowed adapter error handling
20+
821
## [3.2.11] - 2026-02-19
922

1023
### Security

README.md

Lines changed: 185 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[![CI](https://github.com/jovanSAPFIONEER/Network-AI/actions/workflows/ci.yml/badge.svg)](https://github.com/jovanSAPFIONEER/Network-AI/actions/workflows/ci.yml)
66
[![CodeQL](https://github.com/jovanSAPFIONEER/Network-AI/actions/workflows/codeql.yml/badge.svg)](https://github.com/jovanSAPFIONEER/Network-AI/actions/workflows/codeql.yml)
7-
[![Release](https://img.shields.io/badge/release-v3.3.2-blue.svg)](https://github.com/jovanSAPFIONEER/Network-AI/releases)
7+
[![Release](https://img.shields.io/badge/release-v3.3.4-blue.svg)](https://github.com/jovanSAPFIONEER/Network-AI/releases)
88
[![npm](https://img.shields.io/npm/dw/network-ai.svg?label=npm%20downloads)](https://www.npmjs.com/package/network-ai)
99
[![ClawHub](https://img.shields.io/badge/ClawHub-network--ai-orange.svg)](https://clawhub.ai/skills/network-ai)
1010
[![Node.js](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org)
@@ -542,6 +542,190 @@ class MyAdapter extends BaseAdapter {
542542

543543
See [references/adapter-system.md](references/adapter-system.md) for the full adapter architecture guide.
544544

545+
## API Architecture & Performance
546+
547+
**Your swarm is only as fast as the backend it calls into.**
548+
549+
Network-AI is backend-agnostic — every agent in a swarm can call a cloud API, a different cloud API, or a local GPU model. That choice has a direct and significant impact on speed, parallelism, and reliability.
550+
551+
### Why It Matters
552+
553+
When you run a 5-agent swarm, Network-AI can dispatch all 5 calls simultaneously. Whether those calls actually execute in parallel depends entirely on what's behind each agent:
554+
555+
| Backend | Parallelism | Typical 5-agent swarm | Notes |
556+
|---|---|---|---|
557+
| **Single cloud API key** (OpenAI, Anthropic, etc.) | Rate-limited | 40–70s sequential | RPM limits force sequential dispatch + retry waits |
558+
| **Multiple API keys / providers** | True parallel | 8–15s | Each agent hits a different key or provider |
559+
| **Local GPU** (Ollama, llama.cpp, vLLM) | True parallel | 5–20s depending on hardware | No RPM limit — all 5 agents fire simultaneously |
560+
| **Mixed** (some cloud, some local) | Partial | Varies | Local agents never block; cloud agents rate-paced |
561+
562+
### The Single-Key Rate Limit Problem
563+
564+
Cloud APIs enforce **Requests Per Minute (RPM)** limits per API key. When you run 5 agents sharing one key and hit the ceiling, the API silently returns empty responses — not a 429 error, just blank content. Network-AI's swarm demos handle this automatically with **sequential dispatch** (one agent at a time) and **adaptive header-based pacing** that reads the `x-ratelimit-reset-requests` header to wait exactly as long as needed before the next call.
565+
566+
```
567+
Single key (gpt-5.2, 6 RPM limit):
568+
Agent 1 ──call──▶ response (7s)
569+
wait 1s
570+
Agent 2 ──call──▶ response (7s)
571+
wait 1s
572+
... (sequential)
573+
Total: ~60s for 5 agents + coordinator
574+
```
575+
576+
### Multiple Keys or Providers = True Parallel
577+
578+
Register each reviewer agent against a different API key or provider and dispatch fires all 5 simultaneously:
579+
580+
```typescript
581+
import { CustomAdapter, AdapterRegistry } from 'network-ai';
582+
583+
// Each agent points to a different OpenAI key
584+
const registry = new AdapterRegistry();
585+
586+
for (const reviewer of REVIEWERS) {
587+
const adapter = new CustomAdapter();
588+
const client = new OpenAI({ apiKey: process.env[`OPENAI_KEY_${reviewer.id.toUpperCase()}`] });
589+
590+
adapter.registerHandler(reviewer.id, async (payload) => {
591+
const resp = await client.chat.completions.create({ ... });
592+
return { findings: extractContent(resp) };
593+
});
594+
595+
registry.register(reviewer.id, adapter);
596+
}
597+
598+
// Now all 5 dispatch in parallel via Promise.all
599+
// Total: ~8-12s instead of ~60s
600+
```
601+
602+
### Local GPU = Zero Rate Limits
603+
604+
Run Ollama or any OpenAI-compatible local server and drop it in as a backend. No RPM ceiling means every agent fires the moment the previous one starts — true parallel for free:
605+
606+
```typescript
607+
// Point any agent at a local Ollama or vLLM server
608+
const localClient = new OpenAI({
609+
apiKey : 'not-needed',
610+
baseURL : 'http://localhost:11434/v1',
611+
});
612+
613+
adapter.registerHandler('sec_review', async (payload) => {
614+
const resp = await localClient.chat.completions.create({
615+
model : 'llama3.2', // or mistral, deepseek-r1, codellama, etc.
616+
messages: [...],
617+
});
618+
return { findings: extractContent(resp) };
619+
});
620+
```
621+
622+
### Mixing Cloud and Local
623+
624+
The adapter system makes it trivial to give some agents a cloud backend and others a local one:
625+
626+
```typescript
627+
// Fast local model for lightweight reviewers
628+
registry.register('test_review', localAdapter);
629+
registry.register('arch_review', localAdapter);
630+
631+
// Cloud model for high-stakes reviewers
632+
registry.register('sec_review', cloudAdapter); // GPT-4o / Claude
633+
```
634+
635+
Network-AI's orchestrator, blackboard, and trust model stay identical regardless of what's behind each adapter. The only thing that changes is speed.
636+
637+
### Summary
638+
639+
| You have | What to expect |
640+
|---|---|
641+
| One cloud API key | Sequential dispatch, 40–70s per 5-agent swarm — fully handled automatically |
642+
| Multiple cloud keys | Near-parallel, 10–15s — use one key per adapter instance |
643+
| Local GPU (Ollama, vLLM) | True parallel, 5–20s depending on hardware |
644+
| Home GPU + cloud mix | Local agents never block — cloud agents rate-paced independently |
645+
646+
The framework doesn't get in the way of any of these setups. Connect whatever backend you have and the orchestration layer handles the rest.
647+
648+
### Cloud Provider Performance
649+
650+
Not all cloud APIs perform the same. Model size, inference infrastructure, and tier all affect how fast each agent gets a response — and that directly multiplies across every agent in your swarm.
651+
652+
| Provider / Model | Avg response (5-agent swarm) | RPM limit (free/tier-1) | Notes |
653+
|---|---|---|---|
654+
| **OpenAI gpt-5.2** | 6–10s per call | 3–6 RPM | Flagship model, high latency, strict RPM |
655+
| **OpenAI gpt-4o-mini** | 2–4s per call | 500 RPM | Fast, cheap, good for reviewer agents |
656+
| **OpenAI gpt-4o** | 4–7s per call | 60–500 RPM | Balanced quality/speed |
657+
| **Anthropic Claude 3.5 Haiku** | 2–3s per call | 50 RPM | Fastest Claude, great for parallel agents |
658+
| **Anthropic Claude 3.7 Sonnet** | 4–8s per call | 50 RPM | Stronger reasoning, higher latency |
659+
| **Google Gemini 2.0 Flash** | 1–3s per call | 15 RPM (free) | Very fast inference, low RPM on free tier |
660+
| **Groq (Llama 3.3 70B)** | 0.5–2s per call | 30 RPM | Fastest cloud inference available |
661+
| **Together AI / Fireworks** | 1–3s per call | Varies by plan | Good for parallel workloads, competitive RPM |
662+
663+
**Key insight:** A 5-agent swarm using `gpt-4o-mini` at 500 RPM can fire all 5 agents truly in parallel and finish in ~4s total. The same swarm on `gpt-5.2` at 6 RPM must go sequential and takes 60s. **The model tier matters more than the orchestration framework.**
664+
665+
#### Choosing a Model for Swarm Agents
666+
667+
- **Speed over depth** (many agents, real-time feedback) → `gpt-4o-mini`, `gpt-5-mini`, `claude-3.5-haiku`, `gemini-2.0-flash`, `groq/llama-3.3-70b`
668+
- **Depth over speed** (fewer agents, high-stakes output) → `gpt-4o`, `claude-3.7-sonnet`, `gpt-5.2`
669+
- **Free / no-cost testing** → Groq free tier, Gemini free tier, or Ollama locally
670+
- **Production swarms with budget** → Multiple keys across providers, route different agents to different models
671+
672+
All of these plug into Network-AI through the `CustomAdapter` by swapping the client's `baseURL` and `model` string — no other code changes needed.
673+
674+
### `max_completion_tokens` — The Silent Truncation Trap
675+
676+
One of the most common failure modes in agentic output tasks is **silent truncation**. When a model hits the `max_completion_tokens` ceiling it stops mid-output and returns whatever it has — no error, no warning. The API call succeeds with a 200 and `finish_reason: "length"` instead of `"stop"`.
677+
678+
**This is especially dangerous for code-rewrite agents** where the output is a full file. A fixed `max_completion_tokens: 3000` cap will silently drop everything after line ~150 of a 200-line fix.
679+
680+
```
681+
# What you set vs what you need
682+
683+
max_completion_tokens: 3000 → enough for a short blog post
684+
→ NOT enough for a 200-line code rewrite
685+
686+
# Real numbers (gpt-5-mini, order-service.ts rewrite):
687+
Blockers section: ~120 tokens
688+
Fixed code: ~2,800 tokens (213 lines with // FIX: comments)
689+
Total needed: ~3,000 tokens ← hits the cap exactly, empty output
690+
Fix: set to 16,000 → full rewrite delivered in one shot
691+
```
692+
693+
**Lessons learned from building the code-review swarm:**
694+
695+
| Issue | Root cause | Fix |
696+
|---|---|---|
697+
| Fixed code output was empty | `max_completion_tokens: 3000` too low for a full rewrite | Raise to `16000`+ for any code-output agent |
698+
| `finish_reason: "length"` silently discards output | Model hits cap, returns partial response with no error | Always check `choices[0].finish_reason` and alert on `"length"` |
699+
| `gpt-5.2` slow + expensive for reviewer agents | Flagship model = high latency + $14/1M output tokens | Use `gpt-5-mini` ($2/1M, 128k output, same RPM) for reviewer/fixer agents |
700+
| Coordinator + fixer as two separate calls | Second call hits rate limit window, adds 60s wait | Merge into one combined call with a structured two-section response format |
701+
702+
**Rule of thumb for `max_completion_tokens` by task:**
703+
704+
| Task | Recommended cap |
705+
|---|---|
706+
| Short classification / sentiment | 200–500 |
707+
| Code review findings (one reviewer) | 400–800 |
708+
| Blocker summary (coordinator) | 500–1,000 |
709+
| Full file rewrite (≤300 lines) | 12,000–16,000 |
710+
| Full file rewrite (≤1,000 lines) | 32,000–64,000 |
711+
| Document / design revision | 16,000–32,000 |
712+
713+
All GPT-5 variants (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5.2`) support **128,000 max output tokens** — the ceiling is never the model, it's always the cap you set.
714+
715+
#### Cloud GPU Instances (Self-Hosted on AWS / GCP / Azure)
716+
717+
Running your own model on a cloud GPU VM (e.g. AWS `p3.2xlarge` / A100, GCP `a2-highgpu`, Azure `NC` series) sits between managed APIs and local hardware:
718+
719+
| Setup | Parallelism | Speed vs managed API | RPM limit |
720+
|---|---|---|---|
721+
| A100 (80GB) + vLLM, Llama 3.3 70B | True parallel | **Faster** — 0.5–2s per call | None |
722+
| H100 + vLLM, Mixtral 8x7B | True parallel | **Faster** — 0.3–1s per call | None |
723+
| T4 / V100 + Ollama, Llama 3.2 8B | True parallel | Comparable | None |
724+
725+
Since you own the endpoint, there are no rate limits — all 5 agents fire at the same moment. At inference speeds on an A100, a 5-agent swarm can complete in **3–8 seconds** for a 70B model, comparable to Groq and faster than any managed flagship model.
726+
727+
The tradeoff is cost (GPU VMs are $1–$5/hr) and setup (vLLM install, model download). For high-volume production swarms or teams that want no external API dependency, it's the fastest architecture available. The connection is identical to local Ollama — just point `baseURL` at your VM's IP.
728+
545729
## Permission System
546730

547731
The AuthGuardian evaluates requests using:

index.ts

Lines changed: 4 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -407,15 +407,8 @@ class SharedBlackboard {
407407
* Prevents DoS via oversized writes and circular data.
408408
*/
409409
private validateValue(value: unknown): { valid: boolean; reason?: string } {
410-
if (value === undefined) {
411-
return { valid: false, reason: 'Value is undefined and cannot be stored' };
412-
}
413410
try {
414411
const serialized = JSON.stringify(value);
415-
if (serialized === undefined) {
416-
// JSON.stringify returns undefined for unsupported root types (functions, symbols, etc.)
417-
return { valid: false, reason: 'Value cannot be serialized (unsupported type)' };
418-
}
419412
if (serialized.length > CONFIG.maxBlackboardValueSize) {
420413
return { valid: false, reason: `Value exceeds max size (${serialized.length} > ${CONFIG.maxBlackboardValueSize} bytes)` };
421414
}
@@ -1208,14 +1201,10 @@ class TaskDecomposer {
12081201

12091202
individualResults.push(result);
12101203

1211-
// Cache successful results (non-fatal if caching fails)
1212-
if (result.success && result.result !== undefined) {
1213-
try {
1214-
const cacheKey = `task:${task.agentType}:${this.hashPayload(task.taskPayload)}`;
1215-
this.blackboard.write(cacheKey, result.result, context.agentId, 3600, 'system-orchestrator-token'); // 1 hour TTL
1216-
} catch {
1217-
// Caching failure should not abort the entire wave
1218-
}
1204+
// Cache successful results
1205+
if (result.success) {
1206+
const cacheKey = `task:${task.agentType}:${this.hashPayload(task.taskPayload)}`;
1207+
this.blackboard.write(cacheKey, result.result, context.agentId, 3600, 'system-orchestrator-token'); // 1 hour TTL
12191208
}
12201209
}
12211210
}
@@ -1293,19 +1282,6 @@ class TaskDecomposer {
12931282

12941283
const result = await this.adapterRegistry.executeAgent(task.agentType, agentPayload, agentContext);
12951284

1296-
// If the adapter returned a failure, propagate it
1297-
if (!result.success) {
1298-
return {
1299-
agentType: task.agentType,
1300-
success: false,
1301-
result: {
1302-
error: result.error?.message ?? 'Agent execution failed',
1303-
recoverable: result.error?.recoverable ?? true,
1304-
},
1305-
executionTime: Date.now() - taskStart,
1306-
};
1307-
}
1308-
13091285
// Sanitize adapter output before returning/caching
13101286
let sanitizedData = result.data;
13111287
try {

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "network-ai",
3-
"version": "3.3.3",
3+
"version": "3.3.4",
44
"description": "AI agent orchestration framework for TypeScript/Node.js - plug-and-play multi-agent coordination with 12 frameworks (LangChain, AutoGen, CrewAI, OpenAI Assistants, LlamaIndex, Semantic Kernel, Haystack, DSPy, Agno, MCP, OpenClaw). Built-in security, swarm intelligence, and agentic workflow patterns.",
55
"main": "dist/index.js",
66
"types": "dist/index.d.ts",

0 commit comments

Comments
 (0)