Based on:
AUDIT_REPORT.mddated February 20, 2026 Scope: All PARTIAL and NOT DONE items across Phases 0–3 Total items: 17 fix tasks across 5 weeks Execution order: Strict — later tasks depend on earlier ones
Each task has:
- A prerequisite list — complete these first
- Numbered steps — exact, ordered actions
- A verification step — how to confirm the fix worked
- A time estimate — realistic solo-developer effort
Work top to bottom. Do not skip tasks or reorder them.
File: tools/testing/test_persistent_gui_fix.py:27
Prerequisites: None
Effort: ~10 minutes
Steps:
- Open
tools/testing/test_persistent_gui_fix.py - At the top of the file, verify
import osis present; add it if missing - Find line 27 where
llm_api_key="AIzaSyCWUpvNYmalx0whFyG6eIIcSY__ioMSZEc"is hardcoded inside aFreeCADCLI(...)constructor call - Replace that argument value with
os.environ.get("GOOGLE_API_KEY", "test-placeholder-key") - Run a global search to confirm no other file contains the key:
grep -r "AIzaSy" . --include="*.py" --include="*.yaml" | grep -v docs/ | grep -v __pycache__ - Stage and commit:
git add tools/testing/test_persistent_gui_fix.py && git commit -m "fix(security): remove hardcoded Google API key from test file"
Verification: The grep command from step 5 returns zero results in .py/.yaml files.
Files: config/config.yaml:11, tools/testing/test_realtime_commands.py:185,197, tools/gui/simple_gui_launcher.py:115, tools/utilities/verify_real_objects.py:15
Prerequisites: None
Effort: ~30 minutes
Steps:
2a — Fix config/config.yaml:
- Open
config/config.yaml - Line 11 has
appimage_path: "/home/vansh5632/Downloads/FreeCAD_1.0.1-conda-Linux-x86_64-py311.AppImage" - Replace the value with an empty string
""— theFreeCADPathResolverinfreecad/path_resolver.pywill pick up the correct path fromFREECAD_APPIMAGE_PATHat runtime - Add a comment above it:
# Set via FREECAD_APPIMAGE_PATH env var or leave empty for auto-detection - Verify
FREECAD_APPIMAGE_PATH=is present in.env.example; add it if missing
2b — Fix tools/testing/test_realtime_commands.py:
- Lines 185 and 197 have
outputs_dir = "/home/vansh5632/DesignEng/freecad-llm-automation/outputs" - Replace both occurrences with
outputs_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "..", "outputs") - Confirm
import osexists at the top; add if missing
2c — Fix tools/gui/simple_gui_launcher.py:
- Line 115 has the same hardcoded outputs path
- Replace with the same
os.path.join(os.path.dirname(...))pattern as 2b
2d — Fix tools/utilities/verify_real_objects.py:
- Line 15 has a hardcoded
.FCStdfile path - Replace with
sys.argv[1]so callers provide the path at runtime; add a usage message andsys.exit(1)if no argument is given - Add
import sysat the top if missing
Verification: grep -rn "/home/vansh5632" . --include="*.py" --include="*.yaml" | grep -v docs/ | grep -v __pycache__ returns zero results.
Files created: src/ai_designer/schemas/llm_schemas.py
Files modified: src/ai_designer/core/llm_provider.py, src/ai_designer/schemas/__init__.py
Prerequisites: None
Effort: ~45 minutes
Steps:
- Create
src/ai_designer/schemas/llm_schemas.py - Move these five classes from
src/ai_designer/core/llm_provider.pyinto the new file — they are currently PydanticBaseModel/Enumdefinitions:LLMProvider,LLMRole,LLMMessage,LLMRequest,LLMResponse - In
core/llm_provider.py, delete the moved definitions and add:from ai_designer.schemas.llm_schemas import LLMProvider, LLMRole, LLMMessage, LLMRequest, LLMResponse - Open
src/ai_designer/schemas/__init__.pyand add the five new symbols to the import block and to__all__ - Run:
make test-unit
Verification: python -c "from ai_designer.schemas import LLMRequest, LLMResponse, LLMMessage; print('OK')" prints OK.
Files created: src/ai_designer/schemas/api_schemas.py
Files modified: src/ai_designer/api/routes/design.py, src/ai_designer/schemas/__init__.py
Prerequisites: Task 3
Effort: ~45 minutes
Steps:
- Create
src/ai_designer/schemas/api_schemas.py - Move these three inline Pydantic models from
src/ai_designer/api/routes/design.pyinto the new file:DesignRequest,DesignResponse,DesignStatusResponse - Note:
design.pyalready importsDesignRequestfromai_designer.schemas.design_stateunder the aliasDesignRequestSchema— rename the movedDesignRequestin the new schemas file toDesignCreateRequestto avoid the collision - In
api/routes/design.py, delete the moved model definitions and add an import fromai_designer.schemas.api_schemas; update all references to the renamedDesignCreateRequest - Add
DesignCreateRequest,DesignResponse,DesignStatusResponsetoschemas/__init__.pyexports - Run:
make test-unit && make test-integration
Verification: python -c "from ai_designer.schemas import DesignCreateRequest, DesignResponse; print('OK')" prints OK. All route integration tests pass.
Files created: src/ai_designer/agents/base.py, tests/unit/agents/test_base.py
Files modified: src/ai_designer/agents/planner.py, src/ai_designer/agents/generator.py, src/ai_designer/agents/validator.py, src/ai_designer/agents/orchestrator.py, src/ai_designer/agents/__init__.py
Prerequisites: Task 3
Effort: ~3 hours
Steps:
5a — Create base.py:
- Create
src/ai_designer/agents/base.py - Define
BaseAgent(ABC)with:__init__acceptingllm_provider: UnifiedLLMProvider,max_retries: int = 3,agent_type: AgentType- Instance attributes:
self.llm_provider,self.max_retries,self.agent_type,self.logger = get_logger(self.__class__.__name__) - Abstract method
execute(self, state: dict) -> dict - Protected method
_build_llm_request(self, messages, model=None, temperature=None) -> LLMRequest— builds anLLMRequestusingLLMMessage; falls back toself.default_temperatureif temperature isNone - Protected method
_call_llm(self, request: LLMRequest) -> LLMResponse— callsself.llm_provider.complete(request)with retry loop up toself.max_retries, catchingLLMError, logging each attempt, re-raising after final failure - Property
name -> strreturningself.__class__.__name__
5b — Inherit BaseAgent in each agent:
- In
planner.py: addfrom ai_designer.agents.base import BaseAgent, change class to inheritBaseAgent, callsuper().__init__(...)first in__init__, remove duplicatedself.llm_provider,self.max_retries,self.agent_type,self.loggerassignments, replace inline retry loops with calls toself._call_llm(...) - Repeat for
generator.pyandvalidator.pywith their respectiveAgentTypevalues - For
orchestrator.py— inheritBaseAgentbut itsexecute()delegates to sub-agents rather than calling LLM directly, so it is a lighter change
5c — Update agents/__init__.py:
- Add
from .base import BaseAgentand includeBaseAgentin__all__
5d — Write tests in tests/unit/agents/test_base.py:
- Create a concrete subclass in the test file, verify instantiation works
- Test that
_call_llmretries exactlymax_retriestimes onLLMErrorbefore raising - Test that
_build_llm_requestreturns a validLLMRequestwith provided messages
Verification: make test-unit passes. python -c "from ai_designer.agents.base import BaseAgent; print(BaseAgent.__abstractmethods__)" prints {'execute'}.
Files created: src/ai_designer/llm/model_config.py
Files modified: src/ai_designer/agents/planner.py, src/ai_designer/agents/generator.py, src/ai_designer/agents/validator.py, src/ai_designer/agents/orchestrator.py, config/config.yaml
Prerequisites: Task 5
Effort: ~2 hours
Steps:
- Create
src/ai_designer/llm/model_config.py - Define
AGENT_MODEL_CONFIG: dictwith four keys ("planner","generator","validator","orchestrator"), each mapping to a dict with:primary(litellm model string),fallback(model string),temperature(float),max_tokens(int)- Planner:
anthropic/claude-3-5-sonnet-20241022, fallbackgoogle/gemini-pro, temp0.3, tokens4096 - Generator:
openai/gpt-4o, fallbackdeepseek/deepseek-coder, temp0.2, tokens8192 - Validator:
anthropic/claude-3-5-sonnet-20241022, fallbackopenai/gpt-4o, temp0.3, tokens2048 - Orchestrator:
anthropic/claude-3-5-sonnet-20241022, fallbackgoogle/gemini-pro, temp0.4, tokens2048
- Planner:
- Add
get_agent_config(agent_name: str) -> dict— returns config, raisesKeyErrorif not found - Add
get_env_override(agent_name: str, key: str) -> Optional[str]— checks env varAGENT_{AGENT_NAME.upper()}_{KEY.upper()}; returns the value if set, elseNone - Add
llm_agents:section toconfig/config.yamlmirroring these four entries (acts as a config-file override layer) - In each agent's
__init__, replace hardcoded model strings withget_agent_config("planner")["primary"]etc. - Add
get_agent_configtoagents/__init__.pyexports so it's accessible from the package
Verification: python -c "from ai_designer.llm.model_config import get_agent_config; print(get_agent_config('generator')['primary'])" prints openai/gpt-4o.
File modified: src/ai_designer/core/llm_provider.py, src/ai_designer/schemas/llm_schemas.py
Prerequisites: Task 3
Effort: ~1 hour
Steps:
- Open
src/ai_designer/schemas/llm_schemas.py(moved there in Task 3) and add acost_usd: Optional[float] = Nonefield toLLMResponse - Open
src/ai_designer/core/llm_provider.py - After each successful
litellm.completion()call, add:cost = litellm.completion_cost(completion_response=response)thenself.total_cost += cost if cost else 0.0 - Populate
cost_usd=costwhen constructing theLLMResponsereturn value - Log per-call cost at
debuglevel: includecost_usd,model, and totaltokens - Add
get_total_cost(self) -> floatmethod returningself.total_cost - Add
reset_cost_tracking(self)method settingself.total_cost = 0.0 - Run:
make test-unit
Verification: In test_llm_provider.py, mock litellm.completion_cost to return 0.001 and assert get_total_cost() returns 0.001 after one call.
Files modified: src/ai_designer/core/llm_provider.py, src/ai_designer/api/routes/ws.py
Prerequisites: Task 7
Effort: ~2 hours
Steps:
- In
src/ai_designer/core/llm_provider.py, addasync def complete_stream(self, request: LLMRequest) -> AsyncGenerator[str, None]toUnifiedLLMProvider - Inside, call
litellm.acompletion(..., stream=True)— useacompletion(async) rather thancompletionto avoid blocking - Iterate with
async for chunk in response:and yieldchunk.choices[0].delta.contentwhen it is notNoneor empty - Do not retry streaming failures — add a comment documenting this; raise
LLMErrorimmediately on exception - In
src/ai_designer/api/routes/ws.py, add a handler for WebSocket message type"stream_design": extractpromptfrom the message, callcomplete_stream()on the LLM provider, iterate the async generator and send each chunk back as a WebSocket text message with structure{"type": "stream_chunk", "content": chunk}, finally send{"type": "stream_done"}after the generator is exhausted - Add a test in
tests/unit/core/test_llm_provider.pymockinglitellm.acompletionwith an async mock that yields streaming chunks
Verification: make test-unit passes. The streaming mock test confirms chunks are yielded in order.
File modified: src/ai_designer/llm/unified_manager.py
Prerequisites: Task 3, Task 6
Effort: ~3 hours
Context: unified_manager.py uses legacy DeepSeekR1Client and LLMClient with its own dataclass-based LLMRequest/LLMResponse. It is still used by cli.py. The goal is to delegate under the hood to UnifiedLLMProvider while preserving the external interface.
Steps:
- In
UnifiedLLMManager.__init__, createself._provider = UnifiedLLMProvider(...)usingmodel_config.get_agent_config("generator")["primary"]as the default model - Find the main generation method (the method that accepts the legacy
LLMRequestdataclass) - Replace its body: convert the legacy
LLMRequestinto aschemas.llm_schemas.LLMRequest, callself._provider.complete(new_request), convert the returnedschemas.llm_schemas.LLMResponseback to the legacyLLMResponsedataclass, and return it - Add
# DEPRECATED: Use ai_designer.schemas.llm_schemas.LLMRequestcomments on the legacy dataclass definitions at the top of the file - Add
# DEPRECATED: Kept for backward compat onlyon theDeepSeekR1ClientandLLMClientimport lines - Do not delete anything yet — delegates and annotates only, to keep
cli.pyworking - Run:
make test-unit
Verification: The manager can be instantiated and its generate() call flows through to UnifiedLLMProvider. All existing unit tests pass.
Files created: src/ai_designer/cli/__init__.py, src/ai_designer/cli/app.py, src/ai_designer/cli/commands.py, src/ai_designer/cli/display.py, src/ai_designer/cli/session.py
Files deleted: src/ai_designer/cli.py
Files modified: src/ai_designer/__main__.py
Prerequisites: Task 9
Effort: ~1 day
Steps:
10a — Create the package skeleton:
- Create directory
src/ai_designer/cli/ - Create
cli/__init__.py— re-exportFreeCADCLIfromapp.pyso that existingfrom ai_designer.cli import FreeCADCLIimports continue to work without changes
10b — Extract cli/session.py:
- Create
cli/session.py - Move all session-state attributes from
FreeCADCLI.__init__(history list, context dict, current doc path, active session metadata) plus theshow_history()method into aSessionManagerclass
10c — Extract cli/display.py:
- Create
cli/display.py - Move output-only methods that have no side effects other than printing:
show_help()(line 922),show_state()(line 758),show_file_info()(line 1093),show_save_info()(line 1119),show_websocket_status()(line 1225),show_persistent_gui_status()(line 1252),_display_workflow_results()(line 631) - These can be standalone functions accepting state as arguments, or a
DisplayManagerclass
10d — Extract cli/commands.py:
- Create
cli/commands.py - Move command-execution methods:
execute_command()(line 458),execute_deepseek_command()(line 795),_use_direct_deepseek_api()(line 804),_execute_generated_code()(line 867),execute_script()(line 742),execute_complex_shape()(line 1151),analyze_state()(line 768) - These methods need the LLM manager and FreeCAD client — accept them as constructor arguments
10e — Create cli/app.py:
- Create
cli/app.pywith the trimmedFreeCADCLIclass - Replace extracted methods with delegation calls to
SessionManager,DisplayManager, and command functions FreeCADCLIshould now contain only:__init__,initialize(),interactive_mode(),cleanup(),_start_websocket_server()- Target: ~300 lines
10f — Update __main__.py and delete old file:
- Confirm
from ai_designer.cli import FreeCADCLIstill works (it should — via__init__.pyre-export) - Delete
src/ai_designer/cli.py - Run:
make test-unit— fix any import errors
Verification: python -m ai_designer --help runs without error. wc -l src/ai_designer/cli/app.py is under 350 lines.
Files created: src/ai_designer/core/state_analyzer.py
Files modified: src/ai_designer/core/state_llm_integration.py
Prerequisites: Task 10
Effort: ~1 day
Steps:
- Open
state_llm_integration.pyand read through all methods end-to-end before touching anything - Identify methods that duplicate what the
PlannerAgent,GeneratorAgent, andValidatorAgentnow do — mark each with# DEPRECATED: Use agents.planner / agents.generator / agents.validator instead; do not delete yet - Identify state-analysis methods that are genuinely unique and not covered by agents — things that read and diff the live FreeCAD document state or build context summaries for prompts
- Create
src/ai_designer/core/state_analyzer.pywith aStateAnalyzerclass; move only the still-unique methods here - In
state_llm_integration.py, replace moved methods with delegation calls toStateAnalyzer - Identify any prompt-building logic that is not already in
agents/prompts/— move unique prompts toagents/prompts/system_prompts.pyas new named constants - Target:
state_llm_integration.pydown to ~400 lines - Run:
make test-unitafter each batch of moves — fix regressions immediately
Verification: wc -l src/ai_designer/core/state_llm_integration.py is under 500 lines.
Files created: src/ai_designer/llm/providers/__init__.py, src/ai_designer/llm/providers/deepseek.py
Files modified: src/ai_designer/llm/deepseek_client.py, src/ai_designer/llm/unified_manager.py
Prerequisites: Task 9
Effort: ~1 day
Steps:
- Open
deepseek_client.pyand scan all methods end-to-end before touching anything - Identify logic that is truly DeepSeek-specific and NOT handled by litellm's native DeepSeek support — likely: Ollama's HTTP API format, R1 reasoning-chain extraction from
<think>tags, streaming thought-process parsing - Create
src/ai_designer/llm/providers/directory with__init__.py - Create
src/ai_designer/llm/providers/deepseek.py— move only the unique Ollama/R1 logic here (target ~200 lines): the raw HTTP client calls to the Ollama API, the<think>tag extraction function, and timeout/retry logic specific to local Ollama - In
deepseek_client.py, replace moved methods with delegation toproviders/deepseek.pyand mark everything that duplicates litellm with# DEPRECATED: litellm handles this via 'deepseek/...' model prefix - In
unified_manager.py, ensure code paths that need Ollama-specific behavior now useproviders/deepseek.py; cloud DeepSeek API falls through toUnifiedLLMProvider - Run:
make test-unit
Verification: wc -l src/ai_designer/llm/deepseek_client.py is under 400 lines. Import test passes.
Files created: src/ai_designer/freecad/workflow_templates.py, src/ai_designer/freecad/geometry_helpers.py, src/ai_designer/freecad/state_diff.py
Files modified: src/ai_designer/freecad/state_aware_processor.py
Prerequisites: Task 11
Effort: ~1.5 days
Steps:
13a — Extract workflow_templates.py:
- Identify all hardcoded workflow template strings, script templates, and operation-sequence definitions (Box, Cylinder, Loft, Sweep, etc.)
- Create
src/ai_designer/freecad/workflow_templates.py— move them as module-level constants or aWorkflowTemplatedataclass - In
state_aware_processor.py, import fromworkflow_templates - Run:
make test-unit— fix regressions
13b — Extract geometry_helpers.py:
- Identify pure geometry utility functions: bounding-box calculations, volume estimation, face counting, shape-type detection, tolerance comparisons
- Create
src/ai_designer/freecad/geometry_helpers.pyas a collection of standalone functions (no class wrapper needed) - Replace inline geometry calculations in
state_aware_processor.pywith calls to these helpers - Run:
make test-unit
13c — Extract state_diff.py:
- Identify methods that compare two document state dicts: added/removed objects, changed features, changed constraints
- Create
src/ai_designer/freecad/state_diff.pywith aStateDiffdataclass and acompute_diff(before: dict, after: dict) -> StateDifffunction - Replace inline state-comparison logic with calls to
compute_diff() - Run:
make test-unit
Verification: wc -l src/ai_designer/freecad/state_aware_processor.py is under 600 lines. make test-unit passes.
Files created: docker/Dockerfile.production, docker/Dockerfile.dev, .dockerignore
Files modified: docker-compose.yml, Makefile
Prerequisites: Tasks 1–13
Effort: ~1.5 days
Steps:
14a — Create .dockerignore:
- Create
.dockerignorein the repo root - Add:
__pycache__/,*.pyc,*.pyo,.git/,.env,venv/,htmlcov/,outputs/,*.FCStd,tests/,docs/,.pytest_cache/,node_modules/
14b — Create docker/Dockerfile.production:
- Create
docker/directory - Write a two-stage build:
- Stage 1 (
builder): Frompython:3.11-slim, install build tools, copypyproject.toml, runpip install --no-cache-dir . - Stage 2 (
runtime): Frompython:3.11-slim, install FreeCAD runtime deps via apt (freecadcmd,libocct-modeling-data-dev), create non-root userfreecadwith UID/GID 1000 viauseradd -m -u 1000 -g 1000 freecad, copy installed packages from builder stage, copysrc/andconfig/, setWORKDIR /app, switch toUSER freecad, expose8000, setCMD ["uvicorn", "ai_designer.api.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]
- Stage 1 (
- Set env vars in the Dockerfile:
PYTHONUNBUFFERED=1,PYTHONDONTWRITEBYTECODE=1,FREECAD_HEADLESS=1
14c — Create docker/Dockerfile.dev:
- Single stage from
python:3.11 - Install dev deps with
pip install -e ".[dev]", source code mounted as a volume (not COPY), run with--reload
14d — Rewrite docker-compose.yml:
- Define three services:
redis: imageredis:7-alpine, port6379:6379, healthcheckredis-cli pingevery 10s, restartunless-stopped, volumeredis_data:/data, commandredis-server --appendonly yesapi: build./+docker/Dockerfile.production, depends onrediswithcondition: service_healthy, port8000:8000,env_file: .env, env varsREDIS_HOST=redis+REDIS_PORT=6379, healthcheck onGET /healthevery 15s,mem_limit: 2g,cpus: "2", volume./outputs:/app/outputsredis-commander: underprofiles: [dev]so it only starts withdocker compose --profile dev up
- Define named volume
redis_data
14e — Update Makefile:
- Add
docker-build:docker build -f docker/Dockerfile.production -t freecad-ai-designer . - Add
docker-run:docker compose up -d - Add
docker-stop:docker compose down - Add
docker-logs:docker compose logs -f api
Verification: docker build -f docker/Dockerfile.production -t freecad-ai-designer . succeeds. docker compose up -d && curl http://localhost:8000/health returns {"status": "ok"}.
Files created: src/ai_designer/api/middleware/__init__.py, src/ai_designer/api/middleware/auth.py, src/ai_designer/api/middleware/rate_limit.py
Files modified: src/ai_designer/api/app.py, src/ai_designer/api/deps.py, pyproject.toml, .env.example
Prerequisites: Task 14
Effort: ~1 day
Steps:
15a — Add dependencies to pyproject.toml:
- Add
python-jose[cryptography]>=3.3.0andpasslib[bcrypt]>=1.7.4 - Run
pip install -e ".[dev]"to install
15b — Create middleware/auth.py:
- Create
src/ai_designer/api/middleware/directory with empty__init__.py - Read
AUTH_ENABLEDfrom env (defaultfalse) — when false, middleware is a no-op passthrough - Read
AUTH_API_KEYS(comma-separated list of valid keys),JWT_SECRET_KEY,JWT_ALGORITHM = "HS256",JWT_ACCESS_TOKEN_EXPIRE_MINUTES = 15from env - Define
JWTAuthMiddlewareas a StarletteBaseHTTPMiddlewaresubclass: extractAuthorization: Bearer <token>header, decode withpython-jose, return401if invalid or expired; skip auth for/health,/ready,/docs,/redoc,/openapi.json,/metrics - Add
AUTH_ENABLED,AUTH_API_KEYS,JWT_SECRET_KEYto.env.example
15c — Create middleware/rate_limit.py:
- Implement a Redis-backed sliding window rate limiter
- Config from env:
RATE_LIMIT_REQUESTS = 100,RATE_LIMIT_WINDOW_SECONDS = 60 - Logic:
ZADD key timestamp timestamp,ZREMRANGEBYSCORE key 0 (now - window),ZCARD key; if count > limit, return429withRetry-Afterheader;EXPIRE key window - When Redis is unavailable, fail open: log a warning, allow the request
- Add
RATE_LIMIT_REQUESTS,RATE_LIMIT_WINDOW_SECONDSto.env.example
15d — Register both middleware in app.py:
- Import
JWTAuthMiddlewareandRateLimitMiddleware - In
create_app(), addapp.add_middleware(RateLimitMiddleware)thenapp.add_middleware(JWTAuthMiddleware)— rate limit is outermost - Middleware classes self-disable via their
_ENABLEDenv var; no conditional logic needed inapp.py
15e — Update deps.py:
- Find the
verify_api_keystub (currently a TODO) - Replace its body with a call to
auth.verify_api_key(api_key)
Verification: With AUTH_ENABLED=false (default), make test-integration passes. With AUTH_ENABLED=true and no token, curl http://localhost:8000/api/v1/design returns 401.
Files created: src/ai_designer/core/metrics.py
Files modified: src/ai_designer/api/app.py, src/ai_designer/orchestration/nodes.py, src/ai_designer/core/llm_provider.py, pyproject.toml
Prerequisites: Task 15
Effort: ~1 day
Steps:
16a — Add dependencies:
- Add
prometheus-client>=0.19.0topyproject.toml
16b — Create core/metrics.py with these module-level Prometheus objects:
design_requests_total = Counter("design_requests_total", "Total design requests", ["status"])— labels:success,failure,timeoutdesign_duration_seconds = Histogram("design_duration_seconds", "End-to-end pipeline duration")agent_call_duration_seconds = Histogram("agent_call_duration_seconds", "Per-agent call duration", ["agent_name"])llm_tokens_used_total = Counter("llm_tokens_used_total", "Total tokens consumed", ["provider", "model"])llm_cost_usd_total = Counter("llm_cost_usd_total", "Total LLM cost in USD", ["provider"])active_designs = Gauge("active_designs", "Currently running pipelines")
16c — Add /metrics endpoint to app.py:
- Import
prometheus_client.generate_latestandCONTENT_TYPE_LATEST - Add
@app.get("/metrics")returningResponse(generate_latest(), media_type=CONTENT_TYPE_LATEST) - Add
/metricsto the auth middleware exclusion paths list
16d — Instrument pipeline nodes in orchestration/nodes.py:
- Wrap each agent call with
with agent_call_duration_seconds.labels(agent_name="...").time(): - Increment
active_designs.inc()at pipeline start,active_designs.dec()at end - Increment
design_requests_total.labels(status="success"/"failure")based on outcome
16e — Instrument LLM provider in core/llm_provider.py:
- After each successful call, read
response.usageand incrementllm_tokens_used_total.labels(provider=..., model=...).inc(tokens) - Increment
llm_cost_usd_total.labels(provider=...).inc(cost_usd)using the cost computed in Task 7
Verification: curl http://localhost:8000/metrics returns Prometheus text format. design_requests_total, agent_call_duration_seconds, and active_designs are visible.
Files created: tests/load/__init__.py, tests/load/locustfile.py, tests/load/scenarios.py
Files modified: Makefile
Prerequisites: Task 14 (Docker setup must exist)
Effort: ~4 hours
Steps:
17a — Add locust>=2.20.0 to dev dependencies in pyproject.toml
17b — Create tests/load/locustfile.py:
- Define
DesignUser(HttpUser)withwait_time = between(1, 5) - Task weight 3 —
create_simple_design: POST to/api/v1/designwith prompt"Create a simple box 100x50x30mm"andmax_iterations=2; usecatch_response=Trueand mark as failure if status is not202; store returnedrequest_idin a list for use by the status task - Task weight 1 —
check_health: GET/health; mark as failure if status is not200 - Task weight 1 —
get_design_status: GET/api/v1/design/{id}using a random ID from the stored list; skip gracefully if no IDs available yet
17c — Create tests/load/scenarios.py:
- Document three named load profiles as plain comments/constants:
STEADY_STATE: 100 users, spawn rate 10/s, duration 5 minutesCOMPLEX_WORKLOAD: 50 users, spawn rate 5/s, prompts are multi-feature assembliesSPIKE: ramp from 0 to 200 users in 60 seconds, sustain for 2 minutes
- Document success criteria: P95 latency < 10s, error rate < 1%
17d — Update Makefile:
- Add
load-testtarget:locust -f tests/load/locustfile.py --host=http://localhost:8000 --users=100 --spawn-rate=10 --run-time=5m --headless --html=outputs/load_test_report.html - Add
load-test-uitarget: same without--headlessto open the Locust web UI
Verification: make load-test completes without crashing the process. outputs/load_test_report.html is generated.
| Week | Days | Tasks | Description |
|---|---|---|---|
| Week 1 | Day 1 | 1, 2 | Remove leaked key + hardcoded paths |
| Week 1 | Day 2 | 3, 4 | Schema consolidation |
| Week 1 | Day 3–4 | 5 | BaseAgent + refactor all agents |
| Week 1 | Day 5 | 6, 7 | model_config + cost tracking |
| Week 2 | Day 1 | 8 | Streaming SSE |
| Week 2 | Day 2–3 | 9 | Unify LLM manager |
| Week 2 | Day 4–5 | 10 | Split cli.py |
| Week 3 | Day 1–2 | 11 | Reduce state_llm_integration |
| Week 3 | Day 3–4 | 12 | Reduce deepseek_client |
| Week 3 | Day 4–5 | 13 | Reduce state_aware_processor |
| Week 4 | Day 1–2 | 14 | Docker + compose |
| Week 4 | Day 3–4 | 15 | Auth + rate limiting |
| Week 4 | Day 5 | 16 | Prometheus metrics |
| Week 5 | Day 1 | 17 | Locust load tests |
Total estimated effort: ~3 weeks active development for one developer.
| Metric | Before | After |
|---|---|---|
| Leaked secrets | 1 | 0 |
| Hardcoded machine paths | 5 | 0 |
| God classes (>1,000 lines) | 4 (6,290 lines) | 0 |
| Shared schema files | 3 / 5 | 5 / 5 |
| Abstract base agent | Missing | agents/base.py |
| LLM cost tracking | Stub | Live per-call |
| Streaming support | None | complete_stream() + WS |
| Unified LLM layer | Dual stack | Single litellm provider |
| Per-agent model config | Hardcoded strings | llm/model_config.py |
| Production Dockerfile | None | Multi-stage, non-root user |
| Auth middleware | TODO stub | JWT + API key, togglable |
| Rate limiting | None | Redis sliding window |
| Metrics endpoint | None | /metrics Prometheus format |
| Load tests | None | Locust, 3 scenarios |
Each task references exact file paths and line numbers from the live codebase as of February 20, 2026.