Chris0Jeky
diff --git a/‎docs/IMPLEMENTATION_MASTERPLAN.md‎
Lines changed: 1285 additions & 1290 deletions b/‎docs/IMPLEMENTATION_MASTERPLAN.md‎
Lines changed: 1285 additions & 1290 deletions
diff --git a/‎docs/STATUS.md‎
Lines changed: 1141 additions & 1145 deletions b/‎docs/STATUS.md‎
Lines changed: 1141 additions & 1145 deletions
diff --git a/‎docs/decisions/ADR-0026-cloud-cost-observability.md‎
Lines changed: 71 additions & 0 deletions b/‎docs/decisions/ADR-0026-cloud-cost-observability.md‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎docs/decisions/INDEX.md‎
Lines changed: 17 additions & 17 deletions b/‎docs/decisions/INDEX.md‎
Lines changed: 17 additions & 17 deletions
@@ -0,0 +1,71 @@
+# ADR-0026: Cloud Cost Observability and Budget Guardrails
+
+- **Status**: Accepted
+- **Date**: 2026-04-09
+- **Deciders**: Project maintainers
+
+## Context
+
+Taskdeck is transitioning from a purely local-first SQLite tool to a cloud-hosted deployment model (see ADR-0014, platform expansion strategy). Cloud hosting introduces ongoing variable costs that do not exist in local-first operation: compute instances, LLM API calls, storage growth, logging/telemetry volume, network egress, and DNS/domain hosting.
+
+Three characteristics make proactive cost observability essential:
+
+1. **LLM API calls are high-variance**: A single user session with tool-calling can generate 5+ provider round-trips. OpenAI GPT-4o-mini and Gemini 2.5 Flash have different pricing structures, so they must be tracked separately rather than treated as equivalent. The GPT-4o-mini reference model in SPIKE_618 cost roughly $0.00088 per 3-round conversation, but that estimate is only a baseline.
+
+2. **Local-first heritage means no existing cloud cost discipline**: The team has never operated cloud infrastructure at scale. Without explicit budget guardrails, cost surprises are likely during the v0.2.0 cloud launch.
+
+3. **Several features have high-variance cost scaling**: LLM token consumption grows faster than request count when tool-calling multiplies per-message cost, logging volume scales with request count and verbosity configuration, and database storage grows continuously with audit trail accumulation. Even linearly-scaling features like SignalR connections become cost-relevant at scale.
+
+Issue #104 (OPS-12) requires establishing cost visibility, budget alerting, and mitigation playbooks before cloud deployment begins.
+
+## Decision
+
+Establish a proactive cloud cost observability framework with three layers:
+
+1. **Cost telemetry and dashboards**: Define cost dimensions (compute, storage, LLM API, logging, network, CI/CD), track them through cloud provider billing APIs and application-level metrics, and maintain a monthly cost review workflow.
+
+2. **Budget alert thresholds**: Implement tiered alerting at 70% (warning), 90% (critical), and 100% (hard cap) of monthly budget. Alerts route to documented owners with escalation paths.
+
+3. **Feature-level cost hotspot registry**: Maintain a living document mapping high-variance features to their cost drivers, scaling behavior, mitigation levers, and action owners. This registry is reviewed monthly alongside the cost dashboard.
+
+Supporting artifacts:
+- `docs/ops/CLOUD_COST_OBSERVABILITY.md` - framework, dimensions, review workflow
+- `docs/ops/COST_HOTSPOT_REGISTRY.md` - feature-level cost risk tracking
+- `docs/ops/BUDGET_BREACH_RUNBOOK.md` - detection-to-resolution playbook
+
+## Alternatives Considered
+
+- **Reactive-only cost management**: Wait for cost surprises and address them as incidents. Rejected because LLM API costs can spike rapidly (a bug enabling unbounded tool-calling loops could exhaust a monthly budget in hours), and cloud provider billing is typically delayed 4-24 hours.
+
+- **Third-party cost management platform (e.g., Kubecost, Vantage, CloudHealth)**: Adds operational complexity and cost. The current single-node deployment (see `docs/ops/DEPLOYMENT_TERRAFORM_BASELINE.md`) does not justify a dedicated cost management tool. Revisit when multi-node or multi-cloud deployment is in scope.
+
+- **Cloud provider native budgets only (AWS Budgets)**: Necessary but insufficient. AWS Budgets alone cannot correlate application-level behavior (e.g., which feature or user is driving LLM cost) with billing data. The framework uses provider budgets as the alerting backbone while adding application-level cost attribution.
+
+- **Hard spending caps with automatic shutdown**: Too aggressive for a product with active users. The framework uses graduated mitigation (rate-limit, degrade, scale-down) rather than hard shutdown, preserving non-LLM functionality during cost incidents.
+
+## Consequences
+
+**Positive**:
+- Cost surprises during v0.2.0 cloud launch are caught early through tiered alerts.
+- Monthly review cadence creates institutional knowledge about cost trends before they become emergencies.
+- Feature owners have explicit accountability for cost-impacting decisions.
+- Budget breach runbook reduces mean-time-to-mitigate for cost incidents.
+
+**Negative**:
+- Monthly review workflow adds operational overhead (estimated 30-60 minutes per review).
+- Cost estimates in the hotspot registry are approximations that require calibration against real production data.
+- Alert thresholds may need tuning during initial cloud operation - too sensitive causes alert fatigue, too loose defeats the purpose.
+
+**Neutral**:
+- Cost observability artifacts become part of the ops documentation surface that must be maintained alongside infrastructure changes.
+- The framework is cloud-provider-aware (AWS-focused given the Terraform baseline) but the principles are portable.
+
+## References
+
+- Issue: #104 (OPS-12: Cloud cost observability and budget-guardrail automation)
+- Terraform baseline: `docs/ops/DEPLOYMENT_TERRAFORM_BASELINE.md` (#102)
+- Observability baseline: `docs/ops/OBSERVABILITY_BASELINE.md` (#68)
+- LLM cost context: `docs/spikes/SPIKE_618_COMPLETED.md` (tool-calling cost model)
+- Managed-key quota policy: `docs/security/MANAGED_KEY_USAGE_POLICY.md` (#240)
+- Platform expansion strategy: ADR-0014
+- Disaster recovery runbook: `docs/ops/DISASTER_RECOVERY_RUNBOOK.md` (#86)
@@ -5,29 +5,29 @@
 | [0001](ADR-0001-clean-architecture-layering.md) | Clean Architecture Layering | Accepted | 2025 |
 | [0002](ADR-0002-claims-first-identity.md) | Claims-First Identity Model | Accepted | 2026-01 |
 | [0003](ADR-0003-proposal-first-automation.md) | Proposal-First Automation (Review-First Safety) | Accepted | 2026-02-23 |
-| [0004](ADR-0004-multi-tenancy-shared-schema.md) | Multi-Tenancy — Shared Schema + TenantId | Accepted | 2026-02-22 |
-| [0005](ADR-0005-capture-model-queue-wrapper.md) | Capture Model — Queue-Wrapper MVP | Accepted | 2026-02-23 |
-| [0006](ADR-0006-llm-provider-mock-default.md) | LLM Provider — Mock-Default with Config-Gated Live Providers | Accepted | 2026-02 |
+| [0004](ADR-0004-multi-tenancy-shared-schema.md) | Multi-Tenancy - Shared Schema + TenantId | Accepted | 2026-02-22 |
+| [0005](ADR-0005-capture-model-queue-wrapper.md) | Capture Model - Queue-Wrapper MVP | Accepted | 2026-02-23 |
+| [0006](ADR-0006-llm-provider-mock-default.md) | LLM Provider - Mock-Default with Config-Gated Live Providers | Accepted | 2026-02 |
 | [0007](ADR-0007-stable-error-contracts.md) | Stable Error Contracts (ApiErrorResponse) | Accepted | 2026-01 |
 | [0008](ADR-0008-novice-first-product-legibility.md) | Novice-First Product Legibility Before Breadth | Accepted | 2026-03-07 |
-| [0009](ADR-0009-session-token-storage.md) | Session Token Storage — localStorage with Mitigations | Accepted | 2026-03-28 |
-| [0010](ADR-0010-frontend-primitive-stack-shadcn-vue.md) | Frontend Primitive Stack — shadcn-vue | Accepted | 2026-03-28 |
-| [0011](ADR-0011-design-tokens-obsidian-ember.md) | Design Token System — Obsidian & Ember Theme | Accepted | 2026-02-23 |
+| [0009](ADR-0009-session-token-storage.md) | Session Token Storage - localStorage with Mitigations | Accepted | 2026-03-28 |
+| [0010](ADR-0010-frontend-primitive-stack-shadcn-vue.md) | Frontend Primitive Stack - shadcn-vue | Accepted | 2026-03-28 |
+| [0011](ADR-0011-design-tokens-obsidian-ember.md) | Design Token System - Obsidian & Ember Theme | Accepted | 2026-02-23 |
 | [0012](ADR-0012-signalr-realtime-with-polling-fallback.md) | SignalR Realtime with Polling Fallback | Accepted | 2026-02 |
-| [0013](ADR-0013-ci-topology-reusable-workflows.md) | CI Topology — Reusable Workflow Decomposition | Accepted | 2026-03 |
-| [0014](ADR-0014-platform-expansion-four-pillars.md) | Platform Expansion — Four Pillars | Proposed | 2026-03-29 |
-| [0015](ADR-0015-starter-pack-idempotent-apply.md) | Starter Pack — Idempotent Apply with Conflict Detection | Accepted | 2026-02 |
+| [0013](ADR-0013-ci-topology-reusable-workflows.md) | CI Topology - Reusable Workflow Decomposition | Accepted | 2026-03 |
+| [0014](ADR-0014-platform-expansion-four-pillars.md) | Platform Expansion - Four Pillars | Proposed | 2026-03-29 |
+| [0015](ADR-0015-starter-pack-idempotent-apply.md) | Starter Pack - Idempotent Apply with Conflict Detection | Accepted | 2026-02 |
 | [0016](ADR-0016-security-logging-redaction.md) | Security Logging Redaction for Sensitive Flows | Accepted | 2026-02-23 |
-| [0017](ADR-0017-agent-tool-registry-review-first.md) | Agent Tool Registry — Review-First by Default | Accepted | 2026-03 |
-| [0018](ADR-0018-llm-tool-calling-custom-over-semantic-kernel.md) | LLM Tool-Calling — Custom Implementation over Semantic Kernel | Accepted | 2026-04-01 |
-| [0019](ADR-0019-mcp-server-official-sdk-embedded-hosting.md) | MCP Server — Official SDK with Embedded Hosting | Accepted | 2026-04-01 |
+| [0017](ADR-0017-agent-tool-registry-review-first.md) | Agent Tool Registry - Review-First by Default | Accepted | 2026-03 |
+| [0018](ADR-0018-llm-tool-calling-custom-over-semantic-kernel.md) | LLM Tool-Calling - Custom Implementation over Semantic Kernel | Accepted | 2026-04-01 |
+| [0019](ADR-0019-mcp-server-official-sdk-embedded-hosting.md) | MCP Server - Official SDK with Embedded Hosting | Accepted | 2026-04-01 |
 | [0020](ADR-0020-plugin-extension-architecture.md) | Plugin/Extension Architecture RFC and Sandboxing Constraints | Proposed | 2026-04-01 |
-| [0021](ADR-0021-jwt-invalidation-user-active-middleware.md) | JWT Invalidation — User-Active Middleware over Token Blocklist | Accepted | 2026-04-03 |
-| [0022](ADR-0022-analytics-export-csv-first-pdf-deferred.md) | Analytics Export — CSV First, PDF Deferred | Accepted | 2026-04-08 |
+| [0021](ADR-0021-jwt-invalidation-user-active-middleware.md) | JWT Invalidation - User-Active Middleware over Token Blocklist | Accepted | 2026-04-03 |
+| [0022](ADR-0022-analytics-export-csv-first-pdf-deferred.md) | Analytics Export - CSV First, PDF Deferred | Accepted | 2026-04-08 |
 | [0023](ADR-0023-sqlite-to-postgresql-migration-strategy.md) | SQLite-to-PostgreSQL Migration Strategy | Accepted | 2026-04-09 |
-| [0024](ADR-0024-distributed-caching-cache-aside.md) | Distributed Caching — Cache-Aside with Redis/InMemory Fallback | Accepted | 2026-04-09 |
-| [0025](ADR-0025-signalr-scaleout-redis-backplane.md) | SignalR Scale-Out — Redis Backplane | Accepted | 2026-04-09 |
+| [0024](ADR-0024-distributed-caching-cache-aside.md) | Distributed Caching - Cache-Aside with Redis/InMemory Fallback | Accepted | 2026-04-09 |
+| [0025](ADR-0025-signalr-scaleout-redis-backplane.md) | SignalR Scale-Out - Redis Backplane | Accepted | 2026-04-09 |
 | [0026](ADR-0026-cloud-cost-observability.md) | Cloud Cost Observability and Budget Guardrails | Accepted | 2026-04-09 |
 | [0027](ADR-0027-cloud-target-topology-autoscaling.md) | Cloud Target Topology and Autoscaling Reference Architecture | Accepted | 2026-04-09 |
-| [0028](ADR-0028-staged-deployment-bluegreen-canary.md) | Staged Deployment — Blue/Green with Canary Verification | Accepted | 2026-04-09 |
+| [0028](ADR-0028-staged-deployment-bluegreen-canary.md) | Staged Deployment - Blue/Green with Canary Verification | Accepted | 2026-04-09 |
 | [0029](ADR-0029-oidc-mfa-pluggable-identity.md) | OIDC/SSO Integration with Optional TOTP MFA | Accepted | 2026-04-09 |